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abstract* In the past, several point mutations have been introduced individually into the substrate-binding 
site of a-lytic protease (EC 3.4.21.12) and shown to affect its specificity in a predictable manner [Bone, 
R. Silen J L , & Agard, D. A. (1989) Nature 339, 191-195], One of the resulting mutant enzymes 
(Metl90Ala in the numbering system of Fujinaga et al.) [Fujinaga, M„ Delbaere, L. T. J., Brayer, G. D„ 
& James M N. G. (1985) /- MoL Biol. 183, 479-502] cleaves at large hydrophobic residues. We chose 
this enzyme as the parent for a library of mutant proteases. The library was constructed by effecting 
combinatorial random substitution of up to four other residues (Glyl91, Argl92, Met2l3, and Val218) 
thought likely to influence the primary specificity of the protease. Active enzymes in the library were 
screened with a range of synthetic substrates (encompassing 19 different amino acids in the Pi position) 
in order to evaluate their primary cleavage preferences. The amino acid sequences of active mutants 
revealed a strong preference for the replacement of Met2l 3 with a His residue. This substitution also had 
the greatest observed effect on specificity, conferring a greatly increased and, in some cases, dominant 
ability to cleave at His residues in synthetic amide substrates. Mutant enzymes with greatly increased 
proteolytic activity were also found in the library. 
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The aim of enzyme engineering is to generate proteins with 
new and useful functional properties. The ability to alter the 
substrate specificity of an enzyme would be particularly 
advantageous, and many attempts have been made to change 
this property using site-directed mutagenesis. While knowl- 
edge-based engineering has been rewarding, its success has 
been limited by large gaps in the current understanding of 
protein folding and protein-ligand interactions (e.g., Craiket 
aL (1985), Wilks et al. (1988, 1990), Rutter et al. (1987), 
Henderson et al- ( 1 99 1 ), and Alexander et al. (1 99 1 )). In a 
complementary approach, combinatorial random substitution 
has been used to generate libraries of variant proteins that 
contain a proportion of functional mutants (Rcidhaar-Olson 
&Sauer, 1988;Lim fcSauer, 1989, 1991), these often having 
enzymatic properties different from those of the parent 
(Oliphant & Struhl, 1 989; Dunn et al, 1 988; Dunn & Jennings, 
1992). Proteases constitute an industrially useful group of 
enzymes for which substrate specificity is of the utmost 
importance. While purely rational approaches have been used 
to change the substrate specificities of several proteases (e.g., 
Craiket al. (1985), Wells and Estell (1988), Beaumont et aL 
(1992), Wilson etal. (1991), Carter and Wells(1987),Khouri 
et aL (1991). and Hedstrom et aL (1992)). there are few 
examples of combinatorial random substitution being applied 
to this task (Evnin et aL, 1990; Teplyakov et aL, 1992). To 
determine whether proteases of novel specificity could be 

1 This work was sponsored by the Australian Government, Dept. of 
Industry. Technology & Commerce, in conjunction with Peptide Tech- 
nology Ltd. and Bums PhUp Ltd., under the Industry Research & 
Development Generic Biotechnology Program (Agreement No, 14026). 

• Author to whom correspondence should be addressed. 

t Present address: Chief. C.S.LR.O. Division of Tropical Animal 
Production, Private Bag No. 3. P.O. IndooroopiUy, Queensland 4068, 
Australia. 



generated in this way, we simultaneously randomized several 
of the amino acid residues, thought to influence the primary 
specificity of a serine protease. 

The protease chosen for manipulation was a-lytic protease, 
a serine protease secreted by the soil bacterium Lysobacter 
enzymogenesCNhit!iktT> 1970). The structure of the protease, 
which has been determined to high resolution (Fujinaga et 
aL, 1985), shows theenzymeto bea chymotrypsinhomologue. 
Like elastase, a-lytic protease preferentially cleaves on the 
C-terminal side of small uncharged residues such as Ala 
(Kaplan et aL, 1970; Bauer et al., 1981). The residues 
responsible for this primary cleavage specificity may be 
deduced from several crystallographic structures of enryme- 
inhibitor complexes (Bone et aL, 1989a, 199Ia,b). Small 
amino acid residues such as Ala arc preferred at the scissile 
bond because the pocket that accommodates the substrate Pi 
residue is shallow, 1 largely due to the presence of two bulky 
Met residuesat this subsite (Met 190 and Met21 3,in therevised 
numbering system of Fujinaga et al. (1985)). Replacement 
of either of these with Ala residues results in a mutant enzyme 
thatpreferslargehydrophobicresiduesinthePi position (Bone 
etaL, 1989b). Since we felt that an enlarged Si pocket allowed 
a greater scope for substitution at other positions contributing 
to this subsite, we selected one of these mutants (Metl90Ala; 
see Figure 1) as the parent for a protease library. 

Crystallographic structures reveal that the S \ pocket of the 
Met 1 90 Ala mutant (Bone ct aL, 1 991 a) is largely defined by 

1 In the nomenclature of Schecter and Berger (1967). the substrate 
residue immediately N-tcrminal to the sdssilc bond is termed the *i 
residue, the one before that is the Pi residue, and so on. Residues 
C-terminal to the scissile bond are termed Pi'. Pj'. etc. Cognate residue- 
binding subsites in the protease are identified by use of the letter S in 
place of P, e.g.. Si', Si\ etc 
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FIGURE 1: Sterooview of the active site of the Metl90Ala mutant 
of <*-lytic protease. This enzyme was used as the parent for the 
library described in this article The viewer looks directly into the 

: S) binding poctet, in which a substrate Pi side chain (Phc) is shown; 

. the four residues in the protein targeted for random substitution are 
shown by thick lines. This diagram is derived from file IPOS in the 

; Brookhaven Protein Data Bank (Abola ct al., 1987). 

two segments of polypeptide, which we labeled A and B: 

segment A segment 6 

1S0 192 194 213 215 218 220 226 

— AlaGJyArgG/yAsp — // — Me\SerGtyG/yA$rNai\ — Asn — Set — 
* ♦ * * 

Structural information and sequence homology with related 
enzymes indicate that the residues shown in boldface italics 
are essential to the overall structure of the protein and suggest 
that those shown in lightfacc italics may also be important 
Position 190 and our reason for choosing Ala at this position 
have been discussed above. We did not consider any of the 
four remaining residues at the Sj subsite (marked with 
asterisks) to be essential to the overall architecture of the 
protein. Our library was therefore constructed using targeted 
random mutagenesis to effect simultaneous random substi- 
tutions at some or all of the positions 191,1 92, 2 1 3, and 218 
(Figure 1). 

EXPERIMENTAL PROCEDURES 

Materials. The limited range of commercially available 
amino acid-chromopbore conjugates meant that our suite of 
synthetic substrates could not alt be made with the same leaving 
group. In consequence, screening was done using some 
chroxnogenic substrates of the for m SwcAlaProXaa-pNa 2 and 
some fluorogenic substrates of the form Sue AlaProXaa-jflNap 
(where Xaa denotes any natural amino acid residue or D-Ala). 
With the exception of SucAlaPro Ala-pNa (Peptide Institute, 
Japan), these substrates were synthesized from (rcr/-butyl- 
oxycarbonyO-AIaPro and the appropriate amino acid-pNa or 
-0Nap» succinylated using succinic anhydride, and purified 
by reversed'phase HPLC. In each case, the composition was 
confirmed by amino acid analysis. Peptides with P| residues 
as follows were verified as enzyme substrates by digestion 
with the enzyme named: Ala, n-Ala, Gly, Leu, Met, and Val, 
wild- type a-lytic protease; Arg, bovine trypsin (Sigma); Asp 
and Glu, endoprotease Glu-C (Boehringer Mannheim); Phe t 
a-chymotrypsin (Sigma); and Pro T prolinc-spccific endopro- 
tease (Seikagaku Kogyo, Japan). Substrate purity was 
estimated in each case from the total amount of /^nitroaniline 



1 Abbreviations: Sue, succinyl; pNa, p-nitrftaniEdc; 0Nap, 0-napb- 
thyUmide. ' 
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released. Any additional peptide substrates were purchased 
from Bachem. Molecular biology and related procedures were 
according to Sambrooketal. (1989), unless otherwise specified. 
Cloning and Sequencing. We used PCR (95 °C, 30 a; 
-^^30-sr72*f^r3^inT3Ol^ 
sequences S'-ATTTATGCATGCCGATCAGGTCGATC- 
CTCAG-3' and S'-TCTCATCGATCTATTAACCCGTG- 
ACCAGGCTCAGGCC-3' in the presence of 7-deaza-2'- 
dGTP (Innis, 1 990) to amplify the segment encoding the pro 
and enzymatic regions of the a-lytic protease gene (Silen et 
al., 1988) from L, enzymogenes genomic DNA. After it was 
end-filled with Klenow, the blunt-ended fragment was cloned 
intoBluescript (Stratagenelnc). Dideoxy sequencing of DNA 
was done with Taq polymerase using end-labeled primers in 
thepresenceof 7-dea2a-2'*dGTP; the best results were obtained 
when undenatured DNA (250 ng) was used as the template 
for the Promega "fmoP linear amplification method (95 °C, 
30 s; 70 °C, 30 s; 30 cycles). After verification of the cloned 
sequence, a fragment encoding the pelB secretion leader (Lei 
et al M 1987) was inserted upstream of the protease gene, and 
the resulting cassette was ligated into pBS(+) (Stratagene 
Inc.) so that it would be expressed by the indigenous lac 
promoter (Silen et al, 1989; K. D. Haggett, unpublished 
results). 

Mutagenesis. A Batl-EcoRl fragment (0.36 kb) that 
encoded the active-site region of the protease was subcloned 
into Mt3mpl9, and the oligonucleotide 5'-CCGAATCGC- 
CGCGGCCCGCGCAGGCGTTGC-3' was used (Nakamaye 
& Eckstein, 1986;Sayersetal M 1988) reintroduce the mutation 
Metl90Ala. Further mutagenesis of the Metl 90Ala mutant 
was effected using the oligonucleotide 5'-TTGCCGTTG- 
GACTGTGAGTTGCCGCTGCTCATCACGCCCTGC 
-3' to introduce an BcoB restriction site (TGANgTGCT) into 
the region encoding the S i pocket, which resulted \ n two amino 
acid substitutions (Gly2I5Scrand Val218Ser). Replacement 
of the corresponding portion in the pBS(+) expression 
construct with the mutated Bal\-EcoR\ fragment gave us 
the construct to be used as the template for the library. The 
EcoB site served as a genetic marker and later allowed us to 
select against the unrnutated template sequence (Carter, 1 991 ). 

Oligo-A, which had the sequence S'-CACGAACCGC- 
CCGAATCGCC C / C NN G / C NNCGCGCAGGCGTTGC. 
CCTGG-3', was designed to bind to the sense strand encoding 
segment A in the library template and to effect random 
substitutions at positions 191 and 192. Oligo-B, which had 
the sequence 5 / -CAGTTGTTGCCGTTGGACTG G /cN- 
NGTTGCCGCCGCT 0 / c NNCACGCCCTGCGCCTGGO 
CGG-3', was designed to bind to the sense strand encoding 
segment B in the library template and to effect random 
substitutions at positions 213 and 218, as well as to restore 
Gly at position 215* The codon format NN G /c ensured that 
each amino acid could be represented and eliminated two of 
the three stop codons (Dunn et al., 1988). 

Mutagenic priming by oligo-B destroyed the EcoB selection 
sequence, allowing direct selection for incorporation of this 
oligonucleotide. To effect targeted random mutagenesis, 
existing methods (Foss & McClain, 1987; Kramer & Fritz, 
1987; Inouye & Inouye, 1991) were adapted to maximize 
coupled priming. A sample of double-stranded library 
template construct was digested with EcdSX and EcoKl to 
remove a small region (0.2 kb) around the target sequence 
(i.e., the DNA encoding segments A and B), while another 
sample was cleaved at one position only using .ffiwlIlL About 
460 ng of the EcoNl-EcoRl fragment (4.3 kb) and 50 ng of 
the Hindin fragment were denatured together (100 P C (! 3_ 
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min) and reanneakd (65 «C, 10 min and then cooled I to 0 *C 
over 30 min) in 10 of 100 mM Tna containing 500 mM 
KC1, pH 8, to generate approximately 50 ng of gapped duple^ 
At this stage, 8,5 pmol of phosphorylated oligo-A and 8.5 

-^raol of phosphorylated oligCfeB_wcre,ad_ded, and the mixture 

Zs incubated at 65 - C for 3 min and then cooled to 0 • C over 
30 min. T4 DNA polymerase (0.17 unit) and E. coh PNA 
licase (0.85 unit) were added, and cxtension/ligation was 
achieved by incubation in 22.4 of 50 mM Tns containing 
18 M£/mL T4 Gene 32 protein, 0.19 mM NAD+ 0.23 mM 
each dNTP. 4.7 mM dithiothreitol. 35 mM KCU 56 mM 
ammonium acetate, and 4.7 mM MgCl 2l pH 8, at 28 °C for 
3 3 h The DNA was ethanol precipitated and electroporatcd 
into£> coii HB2155 (Carteret aL 1985) using a BioRad gene 
pulser (five 40-/*L aliquots at 2-5 kV. 25 600 fl, with an 
electrode gap of 0.2 cm). Constructs retaining the EcoB site 
(i c those that had failed to incorporate oligo-B) were 
destroyed in this coli B strain. Following discharge, cells 
were allowed to recover in 1 mL of SOC broth (Sambrook el 
al 1989) for 50 min at 28 P C before the titer of primary 
transformants was measured. The SOC outgrowths were then 
pooled and used as the inoculum for 50 mL of l^broth 
containing Q.2 mg/mL ampicUIin and 2% (w/v) glucose, which 
was agitated at 25 °C for 21 h (A m = 3.9). Plasmid isolation 
using a Quiagcn^lOO column (Diagen GmbH) yielded 67.5 
^ of library DNA, Electroporation of E. coli JM109 with 
a small portion (100 ng) of this provided a sample of 
transformants in an amber suppressor strain, where the only 
nonsense codon possible at randomized triplets (TAG) would 
be suppressed- 

Screening Methods. Preliminary screening of transfor- 
mants was done by culturing colonies at 25 °C on L-agar 
plates (pH 7.2) containing 2% skim milk powder, where 
secretion of active protease resulted in localized clearing of 
the opaque growth medium due to degradation of the casein. 
After 1 1 days, the ratio of the diameter of the cleared zone 
to that of the colony was converted to an index value on a scale 
of O-20. Obviously, this screening procedure (and those 
described below) can only identify enzymes that retain activity 
long enough after secretion to exert a detectable effect. In 
consequence, there is a bias toward active enzymes that also 
possess reasonable stability. 

The substrate preferences of active proteases were first 
characterized in a qualitative fashion by a substrate overlay 
method, as follows. Colonies were grown in separate com- 
partments of multiwell plates (such as Nunclon A Multidishes) 
ooLAmp-broth(pH7.5 1 50^g/mLampicillin) that had been 
solidified with 1.5% (w/v) Scaplaque(FMC Marine Colloids). 
Growth for 10 days at 25 °C produced sufficient enzyme for 
sensitive detection. At this time, each well was filled with 1 
mL of a molten solution (40 °C) containing 2 mM pNa or 
1 0 mM jSNap substrate, 1% (w/v) Seaplaque, and 1 1% NJV- 
dimcthylformamide in 100 mM Hepes buffer, pH 8,0. Plates 
were incubated at 37 0 C and inspected repeatedly over a penod 
of 48 h to estimate visually the release of yellow p-nitroanilme 
or fluorescent 0-naphthylamine (using illumination at 366 
nm for the latter). Finally, to enhance detection of low levels 
of p-nitroaniline or /5-naphthylamine, plates were developed 
by diazotization (Ohlsson et al., 1986) or by reaction with 
Fast Blue Salt BN (Barrett, 1972), respectively. Transfor- 
mants were scored on a nonlinear scale of 0-20 to reflect the 
observed rate of substrate hydrolysis. The partly logarithmic 
nature of the scale emphasizes weak activities. 

Initial rates of substrate hydrolysis in solution were measured 
spectrophotometrically at 410 nm for pNa (Erlanger et aL, 
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1961) and at 340 nm for £Nap substrates (Lee et al., 1971). 
Because reaction rates only afford a true reflection of k^ t /K m 
when the substrate concentration is small relative to K m * we 
used the lowest substrate concentrations consistent with 
workable assay durations. Reactions were conducted at 



25 °C in 100 mM Hepes, pH 8, containing 9% NJV-di- 
methylformamide and pNa substrate (0.5 mM) or 0Nap 
substrate (5 mM). Under these conditions, the amount of 
enzyme activity releasing 1 fimol of chromophore/fluorophore 
per minute was defined as 1 unit 

RESULTS 

Construction of the Library. The gene in the expression 
construct used as the template for the library encoded the 
Metl90Ala variant of a-lytic protease and also contained a 
selectable genetic marker that directed the additional sub- 
stitutions Gly2l5Ser and Val2l8Ser. When the template 
construct was cultured in E. coli JM 109 on skim milk plates, 
the triple-mutant enzyme it produced (Metl90Ala/Gly2l5Ser/ 
Yal2l8Ser) was shown to be inactive. Molecular modeling 
(not shown) had indicated that this would be the case, since 
the side chain of Ser215 protrudes into the space normally 
occupied by the main-chain atoms of the substrate residue. 
It was advantageous that unmutated template constructs 
escaping the genetic selection did not give rise to active enzymes 
in the library. The targeted random mutagenesis procedure 
was designed to remove the inactivating mutation at position 
21 5 and to randomize up to four other positions (namely, 191, 
192, 213, and 218) at the Si subsite. Combinatorial replace- 
ment by all 20 amino acids at each of the target positions 
permits a total of 1.6 X 10 5 permutations. The mutagenesis 
procedure generated a library containing 9.7 X 10 4 primary 
transformants. 

Active Enzymes in the Library. Preliminary tests with 
endoprotcase Glu-C. endoprotcase Lys-C F and trypsin con- 
firmed that even proteases with narrow substrate specificities 
could be detected using the skim milk screen. When 8.4 X 
10 3 clones from the library were cultured in E. coli JM109 
on skim milk plates, 0.57% of the colonies expressed active 
enzymes. All 47 of the active enzymes in this sample of the 
library hydrolyzed casein with activities equal to or greater 
than that of the parent (Table I), and a number (such as 
mutants 1 and 2) were exceptionally active in this assay. 

Using wild-type and Metl90Ala protease (WT and MA, 
Table I), the relative magnitudes of scores from the plate- 
overlay screening procedure were shown to be in general 
agreement with those of reaction rates determined spectro- 
photometrically.? Rate values for ^Nap substrates had to be 
adjusted before inclusion to compensate for the different (and 
usually slower) rate of hydrolysis of the leaving group, whereas 
this difference was accommodated in substrate overlays by 
using different scoring systems for flNapand pNa substrates. 
Reaction rates at the chosen substrate concentrations were 
shown to provide a reliable reflection of the relativemagnitudes 
of spedficityconstants (W*m) published for thctest enzymes 
(Wilson et al., 1991; Bone et al, 1989b). 



5 An apparent discrepancy arises at the upper limit to the scale for 
plate screen scores, where very Btrong activities are all awarded the 
maximum value of 20 points. While discrimination between strong 
reactions is not possible in theplate screen due to rapid color or nuoresccnt 
saturation of the culture wells, such restrictions do not apply to the 
spectrophotometry rates. Thus with mutant 55 it » posaible s for 
spectrophotometric rates as varied as 1437 umts/L (Pi = Leu), 4207 
unitR/L (P l - Ala), 10 741 untts/L (P» = Phe). and 13 674 uflita/L <P» 
= Met) all to be awarded the same (maximum) score in the screen data. 
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Tabid: Activity Data for Reference Enzyme* and Enzymes from the Library 



no.* data' SM* G dA A 
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flNap substrates, Pi at shown" ^ 
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c Substrate Pi residues and amino acia rcSiaues at targn pouuwu m w mju»» *» * " £~ V* TV ~. « - urr »w „-lviie 

m thlubleba detectable activity with substrate SucAIaProPrc-pNa, there is no column for Pi - Pro. * Enzyme identifier: ^^^ l £f 
nmtasS WAMetl 90Ala mutant (library parent); numbers, library enzymes, < Sc: Plate screen scores (substrate overlay) on pjeudo-loR scale. Ra. 
sSS5h2*ffi of ^ h]Lhcd ^ for Or' M-i) with SurAlaAlaP«Xaa-pN* (w^re denotes an 

?S?^Sf2!S«S t¥ZL ftmn Wibumet al (1991) and Bone et at (1989b). ' SM: Index of sldm milk clearing (see text). ' Adj: The adjustment 

n^cTof ^ch sequence in the 47 active library enzymes seo,ucn«4. *%: ^m^^€^ 

aft^Sda^at^C pH9. nd: not determined, ft: stable over at least 80 days, * +, value below 1 (number ^nutted to improve c ^)£' 
mutations Gln2l9Hia and AScr2l9» also present. . . . . 



With this information in hand, all 47 of the active enzymes 
in thelibrarysamplewerescrecned by substrate overlay (Table 
I). In addition, 13 tranaformants that expressed active 
enzymes were cultured in rich liquid medium. Since all of the 
liquid cultures contained comparable levels of mutant enzyme 
protein (as judged by HPLC), and zymograms, (not shown) 
confirmed the absence of other proteases, samples of cell-free 
supernatants were used directly in spectrophotometric assays 
to measure reaction rates. The scaling factors used to adjust 
the rates of 0Nap substrates (calculated by comparing the 
activities of each mutant on the SucAlaProAla-pNa and 
SucAlaProAla-j&Nap substrates) are shown in Table I. As 
before, spectrophotometric rates and screening scores were in 



broad agreement. Assays that monitored the hydrolysis of 
ester substrates (beriTyloxycaxboriyl)- AlaLys-OMe and (ben^ 
zyloxycarbonyl)-AlaCys-OMe by measurement of proton 
release (R. G. Whittaker, unpublished results) failed to find 
any library enrymes (among the 13) with a major capacity 
for cleavage at the two natural P| residues not already tested 
(data not shown). Moreover, spectrophotometric measure- 
ment of the rate of hydrolysis of tetrapeptide SwcAlaAla- 
ProLys-pNa by the Met 1 90 Ala enzyme and library enzymes 
7, 9, and 55 confirmed that none had a major ability to cleave 
this substrate (data not shown). 

FromTablelitisevidentthatabouton^ 
in the library sample exhibited substrate preferences that 
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Table II: Inactive Combinations" 



position 





191 


192 


2U 


103 


Pro 


Arg 


Thr 


104 


Val 


LyS 


' Met 


107 


Gly 


Arg 


Met 


108 


Tyr 


LyB 


His 


109 _ 


Gro 


Gly 


Met 


UO 


Val 


Asn 


Gin 


112 


Gly 


Afg 


Arg 


113 


Tyr 


Lys 


Leu 


114 


Gly 


Aig 


Pro 


121 


Gly 


Arg 


Asn 


124 


Hie 


Asn 


Ly B 


128 


Gly 


Tyr 


Arg 


129 


Gin 


Cys 


Leu 


146 


Lys 


Val 


Pro 



218 



GdAAP VLDEMFR A3TIWQHYW 

X i pi ji ii. 

GdAAP VLDEMFR A5TINQHYV 

. _. X 1 ^- 

OMAPVLPBmTp. 'mUOHTW CdAAPVLDEMPR AST.NQHYV O <AA P V L «,E MF R l J^™ 

,. J.,. fftr the library oa«nt (MA) and two enzymes found in the library (mutants 55 and 9). Three histograms are shown 

L e^ *M Sg (hatched VarsW substrate overlays (solid ban); (B) Ur^ plot of 

Inl^h^n^ic Vate data (expressed as units of activity per liter of supernatant); (C) log.o J^ot of the spectropbotometnc rate data. In 

horizontal axes show substrate P, residues in one-letter codes; dA is D-Ala Vertical axe* withou 
rlrS relate Jotte first xiarked scale to the left The partly logarithmic nature of the plate-ficreen scores (explained in Experimental 
Procedures) is evident Logarithmic scales serve to emphasize weak activities. 

reflected the specificity of the parent protease, giving the best 
scores with the Met and Phe substrates and good scores with 
Ala and Leu, The remaining enzymes constituted a distinct 
group that showed greater selectivity and preferred to cleave 
the His and Met substrates. In the enzymes of both groups, 
additional (but lesser) cleavage capabilities were featured to 
different extents. There were also large differences between 
the overall levels of activity exhibited by individual enzymes, 
with some (e.g., mutants 41 and 55) showing rates 20-^45 
times faster than that of the library parent or wild-type 
protease. Some activity data for selected enzymes have been 
plotted to illustrate the functional diversity contained in the 
library (Figure 2). Id general, however, we found that enzymes 
in the library had little or no ability to act on charged Pi 
residues (His being predominantly uncharged at the pH of 
the assay) or to accommodate those which were stericaUy 
unusual (such as r>Ala and Pro). 

Substitutions in the library. DN A sequencing of 2 1 clones 
selected at random from the library (not shown) revealed that 

29% of the transformants were invalid, mostly because of 

partial incorporation of oligo-B (resulting in destruction of 

the EcoB selectable marker but retention of the inactivating 

mutation Gly215Ser). Some 52% of the transformants were 

valid clones that resulted from coupled priming by oligo-A 

and oligo-B (changes possible to positions 191, 192, 213, and 

218), while the remaining 19% of the transformants were 

valid clones that resulted from mutagenic priming by oligo-B 

alone (changes possible to positions 213 and 218 only). 

Sequence data (not shown) revealed that the incorporation of 

nucleotides at randomized positions was somewhat biased, 

with different nucleotides being favored in different positions 

in an unpredictable fashion (cf Dunn et al. (1988) and 

Oliphant and StruhJ (1989))- Nevertheless, the extent of 

nucleotide substitution was sufficient for adequate diversity 

to occur at the amino acid level (see below), and the biases 

cannot have been seriously limiting because many clones 

encoding active enzymes featured substitutions poorly rep- 
resented in the library. We presume that the biases inherent 

to the library served mostly to reduce its yield of active mutants. 

As statistically expected from the frequency of active enzymes 

in the library (0.57%), all 2 1 clones picked at random encoded 

inactive enzymes. Amino acid combinations for the valid 



Arg 
His 
Cy 
Alt 
Leu 
All 
Gift 
Arg 
Ser 
Ser 
Asn 
Ala 
Lys 
Thr 



d Column I contains the clone identification numbers. Each combi- 
nation was ob served only once. 

members of this sample are shown in Table II. 

The substitutions present in each of the 47 active enzymes 
from the library sample (see above) are listed in Table 1. 
About 75% of the active enzymes resulted from mutagenic 
priming by oligo-B alone (changes possible to positions 213 
and 218 only). The remaining active mutants resulted from 
coupled priming by oligo-A and oligo-B (changes possible to 
positions 191, 192,213,and218). Manyresiduecombinations 
were observed more than once (Table I) to the extent that 
about one-half of the active enzymes were sequence replicates. 
Only slight differences (not shown) were observed in the 
substrate preference profiles of mutants. having the same 
sequence. We thought it useful to have independent clones 
of particular mutant genes, since the* existence of replicates 
would to some extent offset the effects of any spurious 
mutations in library clones. However, one of the few 
unplanned changes detected (Ser2l9aAla) was found to occur 
in two clones having the same Si sequence (mutants 8 and 
1 1). Interestingly, the only other spurious mutation observed 
(Gln219His/ASer219a) affected the same position in the 
protein, which was located in a surface loop adjacent to the 
Si site. Whilcchangcsb this region are thought to be unlikely 
to influence enzymatic activity, we have been careful to draw 
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oar main conclusions about structure-activity relationships 
from trends rather than from data for individual mutants. 

Structure-Activity Relationships. Figure 3 compares the 
frequency of s ubstitutions observed in the active enzymes with 
data for substitutions in the (inactive) clones chosen at random 
from the library. While there was some bias in the distribution 
of residues in the randomly chosen clones, representation of 
the 20 alternatives was broadly comparable. In contrast, 
selection for biological function of the mutants resulted in 
substitutions being confined to more limited sets of amino 
acids. In active enzymes, Pro (which might be disruptive to 
structure) was avoided at all four of the positions open to 
substitution. Given that Mis should be largely unprotonated, 
charged residues were scarce. The lack of charged substituents 
was surprising, considering that all four positions have access 
' to solvent and than one of them (position 192) is occupied by 
: Arg in the wild-type and parental sequences. Cys was also 
!- avoided in active enzymes, perhaps because disulfide bond 
I formation in the protein is vulnerable to interference from 
i additional Cys residues. In this connection, we note that 
i mutants 31, 43, and 50 (identical but for the substitution of 
| Leu, He, and Met, respectively, at position 218) were strongly 
active, whereas mutant 1 07 (identical to the previous mutants 
butforthesubstitutionof Cysat position 218) was completely 
inactive (Tables I and II). 

In addition to the general trends described above, each of 
the four randomized positions in active enzymes displayed 
characteristic substitution preferences. Position 191 showed 
a very strong preference for Gly (as found in the wild-type 
sequence) and accommodated other small residues with a 
frequency inversely related to their size, which suggested that 
this position was subject to tight steric constraints. Position 
192 was quite permissive, allowing polar and nonpolar 
substitutions of very different sizes. Position 213 accommo- 
dated a limited set of residues, with a strong preference for 
His. Remarkably, all of theenzymesin the substrate-selective 
group that preferred to cleave His and Met substrates (Table 
t) were found to contain His at position 21 3. The remainder 
if the active mutants, which had braad substrate specificity, 
ill contained residues other than His at this position, with 
Vfet (the wild-type residue) occuiring most frequently. 
Position 218 accepted a range of residues, but had a strong 
reference for Leu, A more detailed examination (Figure 
jb) revealed that the strong preference for Leu at position 
18 was a feature of enzymes that did not contain a His at 
Kjsition 2 13. For His21 3 mutants, the preference for Leu at 
osition 218 was slight, and there was a notable increase in 
he occurrence of small polar residues such as Ser and Asn. 
Spcctrophotometric rate data for the activity of mutant 
uzymes on substrates with different P2 (and, in some cases, 
3 and P4) residues are presented in Figure 4. The correlations 
resented in this figure indicate that different mutants 
ssponded similarly to the same change in the P 2 residue (and, 
here tested, in the P 3 and P4 residues). This strongly suggests 
iat subsites S2-S4 have not been greatly affected by the amino 
tfd substitutions we introduced in the Si subsite. In contrast, 
comparison of reaction rates obtained for the action of each 
izyme on Sa/cAlaProAla-pNa and SucAlaProAla-/SNap 
bstrates (Adj, Table I) revealed differences in how mutants 
ther than those of identical sequence) responded to the switch 
chromophore. This might be an indication of changes to 
e Si' subsite caused unintentionally by our substitutions in 
Si pocket. However, enzyme-substrate interactions 
tcrminal.to the scissile bond axe considered to be of limited 
jportajice in a-lytic protease (Bauer et aL, 1981). 
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FIGURE 3: FrcqueneyofsubsUtutionsinthelibrflty, (a) Each double 
panel refers to one of the target positions (Pos) in the protein and 
shows the incidence {%) of amino acid substituents (a.*.) in active 
library enzymes (upper pond), along with data for clones chosen at 
random from the library (lower panel). Amino acids are indicated 
wring one-letter codes, and the wild-type residue for each position is 
underlined The number is the top right-hand corner of each panel 
indicates the sample size (Le^ number of clones). Each upper panel 
is a histogram of the actual amino acid frequencies observed in active 
enzyme*, excluding for positions 191 and 192 data from enzymes 
that were immutated in segment A. In view of the more limited 
sample size for clones picked at random, each lower panel shows an 
amino acid distribution calculated from the observed nucleotide 
incorporation data for the corrtsponding codon. The histograms of 
actual amino add substitutions (not shown) reflect these plots, (b) 
This double panel uses a similar format to show the incidence of 
amino acidsuhstitm^nftatp^ 

other than His at position 213 (upper panel), along with the 
concluding incidence in those having His at position 213 (lower 




HON 14:25 FAX 726 fl Treadwell Library MC ( 

6256 Biochemistry, VoL 32. No. 24, 1993 



ROPES & GRAY 



©008 
Graham et al. 




6 2 4 6 B 10 

Rate with sciccning substrate (u/litcr) 



12 



Figure 4: Effect of S x mutagenesis on S2-S4 binding sites. Using 
12 library enzymes of different sequences, rates (expressed as umts 
or activity perliter of supernatant) were measured with four pNa 
substrates, and each result was plotted agamst the rate obtained for 
the same enzyme with the same concentration of the corresponding 
itlZ™in» eithntrftte (0 5 mM). Cognate substrates were SwcAla- 

(□), and SwtfiyGlyPhc-pNu (a) for screening substrates SucMz- 
ProAla-pNa, SwcAbProVal-pNa, Su*AlaProPhe ; pNa and Sue 
mJZR£ ^ respectively. In order to fit nUj of tne data on the 
same graph? rate values for the following grates have been 
multipliei by the factors shown: ^^ G ^ h6 "P Na ' 5( * Su£ ~ 
AlaAlaVal-pNa, 3000; S«cAlaProVal-pNa, 400. 

A number of cell-free supernatants showed a steady decline 
in activity during the period of study (Table I). One of tbe 
most unstable enzymes (mutant 55) has broad specificity and 
very high activity, which suggests that autolysis may play a 
role in this process. 

DISCUSSION 

We have demonstrated that combinatorial random sub- 
stitution of a few chosen positions in the substrate-binding 
site of a protease, followed by screening with multiple 
substrates, is a useful method for production and identification 
of mutant enzymes of altered structure and function. This 
approach is not limited to proteases and can be applied to any 
enzyme for which a structural model is available and for which 
appropriate assays can be devised. In fact, the approach we 
have adopted should be applicable in an iterative manner to 
enzymes for which structural information is limited. 4 

In our experiment, 0.57% of the library constructs expressed 
active enzymes, indicating the presence of some 550 active 
mutants in totaL However, DNA sequencing revealed that 
29% of the primary transformants were invahd (because of 
retention of Gly21 5Ser or for other reasons), indicating that 
the true frequency of active mutants must be about 0.8%. 
Moreover, sequencing also disclosed a level of duplication 
among clones that suggested the number of different active 
mutants in our library was closer to 300. Although most of 
the valid mutants i n the library resulted from cou pled priming 



« If an initial library is made by the randomization of a substantial 
number of different position* that may be involved in substrate binding 
then a very low yield of active mutants should result. The sequences of 
these mutants will reveal which positions are largely restricted to the 
wild-type residue (such as position 191 in our study). A ™W*t 
library in which sueh positions are fixed as the wild-type residue, wbb 
theremainir^rxHitioPsof^^ 

be mueh more productive in terms of active mutant enzymes and. for that 
reason, afford useful functional diversity. 



by oligo-A and oligo-B f such double mutants (with four 
positions affected) retained activity much less often than 
mutants resulting from oligo-B alone (with only two positions 
affected). In consequence, while all of tbe active mutants in 
the library contained substitutions in the positions making 
-the -greater-crjntribuiion^ 



218; Figure 1), only a limited number contained additional 
changes at the other two positions. This was a favorable 
situation for analyzing structurevactivity relationships in the 
Si subsite. 

Calculations indicated that U. 9% of the oligo-A sequences 
and 2.3% of the oligo-B sequences were compatible with 
enzymatic function. 5 This result strongly suggests that (when 
considered as pairs) positions 191 and 192 are much more 
tolerant of substitution than positions 213 and 218, an 
observation that is not directly apparent from independent 
consideration of substitutional preferences at each of these 
sites (Figure 3). The difference could be a reflection of the 
fact that positions 213 and 218 contribute considerably more 
to the surface that defines the Si pocket (Figure 1). Overall, 
the positions wc investigated in a-lytic protease display a 
tolerance to substitution slightly greater than that calculated 
elsewhere for substrate-binding residues in ^-lactamase 
(Palzkill & Botstein, 1992). 

Because self-processing of the precursor at a Tbr— Ala 
junction is required for release of the active mature protease 
(Silen et al., 1 989) , we expected that all of the active enzymes 
in our library would have a significant ability to cleave at Thr 
residues. However, we found that the activity of the wild- 
type protease and tbe library parent on a synthetic substrate 
with Thr as the Pi residue was slight and that some of the 
enzymesin the library were even less active with this substrate. 
It seems that processing of the precursor junction by these 
enzymes is still sufficiently fast for good expression of the 
mature protease, perhaps due to the unimolecular nature of 
the reaction or due to unusual strain at this bond. Another 
surprise was the poor correlation between the activity of library 
enzymes on skim milk and on synthetic substrates (r* = 0.8 
at best; data not shown). Transformants with the greatest 
capacity for milk clearing during growth (mutants 1 and 2) 
were not those with the highest overall activity when overlaid 
with synthetic substrates (mutants 41 and 55). Since the 
substrate preferences of the former group were not directed 
toward amino acids common in casein, and since we have 
evidence that our mutagenesis has not affected interactions 
at the other subsites critical to substrate binding (Figure 4), 
we are unable to offer a satisfying explanation for this 

phenomenon. 

In general residue changes in the protease that were 
compatible with enzymatic activity bad only modest effects 
on substrate specificity (Table I), The one exception was the 
substitution of Met2l3 with its strongly favored replacement, 
His. This substitution severely reduced the (otherwise large) 
capacity of the enzyme for cleavage at Phe and conferred 
instead a greatly increased ability to cleave the fluorogenic 
substratchavingHis in the Pi position. Neither the wild-type 
a-lytic p rotease nor the library parent (Met 1 90Ala) displayed 

5 Oligo-B: Since 75% of the active enzymes (i.e.. 75% of 0.57% of 
elones) resulted from priming by oligo-B alone, and 19% of the library 
constructs were valid clones that resulted from mutagen* priming by 
oligc-B alone, we concluded that 2.3% of the oligo-B sequences vere 
compatible with enzymatic function. Oligo-A: Since 25% of the active 
mutants (i.e„ 25% of 0.57% of clones) resulted from coupled pruning by 
oligo-A and oligo-B.but only 2.3% of oligo-B sequences would be expected 
to allow activity, and 5 2% of the library constructs were valid clones that 
resulted from coupled priming, we calculated that 1 1.9% of the ohgo-A 
sequences were compatible with enzymatic function. 
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aciuaHy the preferred P, residue for some oT the ■V*™**" 
Mutants It ■ mutant 9). Thus it appears that the presence 
Sh?£S£b a key position can « an abrupt swttch 

— Hi-priniary-specificity, 



SSSv a recently discovered homologue of «x-lytic protease 
fiSS^Si r*~ ^ains a His residue 
2 13 (Svendscnetal-. 1991) and cleaves Preferentially at Glu 
£ protein and synthetic substrates (Yoshida et al 1988). 
Sen Enough molecular modeling predicts no charged ™ no 
S JleS, region at the P H used. However, since the 

u, pSein or syShetic substrates is unknown, we are unable 
SSmpare it further with our Mct213His «^ 
however note that S. griseus protease E contains a Scr at 
SSr-lSTSu in/tes speculation that -F-J^J 
His .t oosition 213 is a response to the presence of a small 
SdueT^ition 190 and would not have been observed . 
2 had used wild-type «-lytic F°t«*< (rather than the 
Metl90Ala variant) as the parent for our library. 

In addition to their propensity for ^vtag tt^ w<Mw 
all Met213His mutants efficiently cleaved the syntheUc 
SJuta with Met as the P, residue, further 
presumption that His213 is largely uncharged. Un-.oiu^d 
f&haspolararomaticcharacter. Attention has recently been 
3d SSS factions of His whh other aromatic ^residues 

Hfc-His and His-Met interactions have not been StuO eo in 
W JSTrt our Met213His mutants of «-lyt.c protease, 
molecular modeling suggests that it is mainly 
edge of the His2l3 imidazole that centac* tb« substrate P, 
residue, regardless of the precise orientation of the ring ;(t. 
D Graham, unpublished results). In the more common 
SstWine tautomer. i.e., the Jv>imino form (Walters * 
^onance theory (and ^uckel moh^uUr 
orbiul calculations) predicts the N^-Ci edge to beelectron- 

few uncharged P, side chains capable 
dectron-richatora(the tertiary nitrogen and thioedier sulfur, 
tSSX) ta good contact with this potential relecuc- 
^inve redon The observed substrate. preference of our 

favorable polar interactions at the Si subsite. 
We suggest from a comparison of mutants 7. 9. 14, 22 and 

X£ respectively, at position 218) that the overall actmty of 
rte «i?213 enzynicsmay increase in response to increased 
ES££^t*i£ Ml (Table I). Inspection of die 

atleast loosely, to non-His2 1 3 enzymes an well. However, it 

function correlation. While other residue subsbtuttons in 
Stive enzymes resulted in substantial {J* 
observed in sorneminor activities (particularly .n the abuity 

not ,how any systematic dependence upon the sues or 



Treadwell Library MGH^-»->-» ROPES & GRAY 

Biochemical. 32. No. 24, 1993 6257 

hydrophobichy indices of the S, substttuents (not shown). 
We presume in such cases that many individual ster.c and 
electrostatic considerations combine to determine changes .n 
Sdryandthattheydosoinanonadditivewaythatexceeds 

our wesent ability to model and understand them. 
VeTnot^^^ 
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diversity likely to be attained by this »f 
that other mutants of altered primary specific! ty can be 
obtained by mutating a different combinauon of residues » 
the Si subsite, e.g.. by randomizing position 190 in place of 
pSnl91-MoU4r,wesu^ 

His at position 213 with random 
other amenable S, positions (c.g H positionsl90. 19 2 - a » d21 »> 
is likely to have a large proportion of His-cleaving proteases 
ndSy well included* with tighter P, specificity for 
His. Work is in progress to test these bypoUw-. nthe 
meantime, we areengaged in the characterization of the mo e 
interesting of the existing library mutants in greater detail 
The mutants most likely to find application » 
enzymes (e.g., as tools in peptide mapping) are those with 
high actmty and good selectivity, such as mutants 9 and 1. 
In addition, some library mutants (e.g., mutants 41 and 55) 
display a large increase in proteose activity over wdd-^pe 
a-r/tic protease and the library parent. Enzymes such as 
these, with very high activity and broad ipl^bj, 
could haveapplications as general-purpose reagents for protein 
degradation. 
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A new strategy Tor combinatorial mutagenesis was developed and applied to residues 40 through tt> of LamB 
protein (maltoporin), with the aim of identifying amino acids important for LamB structure and function. The 
strategy involved a template containing a stop codon in the target sequence and a pool of random degenerate 
oligonucleotide* covering the region. In vitro mutagenesis followed by selection for function (Dex , ability to 
utilize dextrins) corrected the nonsense mutation and simultaneously forced incorporation of a random 
mut*tion(s) within the region/The relative importance of each residue within the target was indicated by the 
frequency and nature of neutral and deleterious mutations recovered at each position. Residues 41 through 43 
in LamB accepted few neutral substitutions, whereas residues 55 through 57 were highly flexible in this regard. 
Consistent with this finding was that the majority or defective mutants were altered at residues 41 to 43. 
Characterization of these mutants indicated that the nature of residues 41 to 43 influenced the amount of stable 
protein in the outer membrane. These results, as well as the conserved nature of this stretch of residues among 
outer membrane proteins, suggest that residues 41 to 43 of LamB play an important role in the process of outer 
membrane localization. 



Combinatorial mutagenesis studies arc becoming impor- 
tant in the identification of important residues of proteins, 
including transport proteins (3, 15 , 23, 25, 26). In this study, 
wc developed a novel strategy for introducing random 
mutations into a defined region of maltoporin (LamB pro- 
tein), with the aim of identifying residues significant in 
contributing to structure, assembly, and sugar transport. 
The strategy used with LamB can be applied to the muta- 
genesis of any gene coding for a selectable phenotype and 
has considerable advantages over cassette mutagenesis 
methods. 

Maltoporin in ihe outer membrane of Escherichia colt has 
been favored as a model in studies of phage (lambda) binding 
(4) and sugar channel selectivity (1) as well as protein export 
and assembly (9, 20). Amino acid residues which play an 
important role in phage lambda and sugar binding have been 
identified (4, 14). However, the events leading to the local- 
ization and assembly of LamB trimers into the outer mem- 
brane are still not clear. In addition to the signal sequence, 
specific regions in the mature protein may also play an 
important role in the biogenesis process {9, 20) but remain to 
be convincingly identified. 

There are several lines of evidence to suggest that residues 
40 through 60 of mature LamB arc of structural importance. 
Short in-frame deletions overlapping residues 39 to 49 of 
mature LamB resulted in the formation of unstable protein 
that was rapidly degraded (24). possibly as a result of 
incorrect outer membrane routing. Apart from its potential 
role in localization, the N-terminal third of the sequence has 
also been postulated to be important for sugar selectivity and 
channel formation (14, 29) »nd is highly conserved in enteric 
organisms. Residues 40 through 60 of UmB in Esdieridna 
colL Salmonella typhirmtrium, Shigella spp., and Klebsiella 



pneumoniae arc identical except for one substitution (29). 
Residues 40 to 49 also constitute part of a conserved region 
identified by Nikaido and Wu (22) (Fig. 1) in outer membrane 
proteins such as LamB, OmpA, and OmpF. Hence, this 
region is potentially a more generally important structural 
feature in outer membrane proteins, although this conclusion 
has been questioned in studies Of OmpA deletion mutants 
(13). However, no detailed genetic point mutation analysis of 
these sequences has been undertaken, and this study con- 
centrates on amino acid replacements to test whether the 
nature of particular residues is significant in LamB biogen- 
esis and function. 

The detailed three-dimensional structure of LamB is not 
yet available, but a model of folding across the outer 
membrane has been derived, as illustrated in Fig. 1 (4). This 
model predicted that residues 40 to 53 constitute an amphi- 
pathic p-strand and residues 54 to 60 are part of a surface- 
exposed loop, A second aim of this mutagenesis study Was to 
test the secondary structure predictions and indicate 
whether residues 54 to 60 are indeed more flexible in 
accepting substitutions than the predicted transmembrane 
segment involving residues 40 to 53. 

MATERIALS AND METHODS 

Strains; phages, and plasm ids. E, coli K-12 strains were 
used in this study. Strain pop6510 [F" thr leu metA tacY 
tonA supErecA56srl::TnI0lamB (dex-5) (2)) has a chromo- 
somal lamB null mutation and was used as the host strain for 
characterization of plasmid phenorype; BW2800 [araD!39 
MurgF-lac)UJ69 rpsL relA (thi) ptsF25 flbB deoCl rbsR 
ma\B MmalK'lamB)15 zj hcZ::T*5 locY 
IvcA*) (10)1 was used in single-stranded DNA synthesis; 
CJ236 [dut-l ung-l thi<! relAlfF CJl05(Cm r ) (16)] was used 
in the synthesis of urycilated single-stranded DNA for mu- 
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tagenesis. Helper phage M13rvl (30) was used in the syn- 
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pH indicators. For growth of plasmid strains, ampiciHin was 
pmenvat"SO-p.g/mJ-in-MMA-and-LB-and-l(K) ^g/ml-m.DML 



FIG. 1. (A) Predicted structure of fnaltoporin folding across the 
outer membrane (4), with alternating loosely structured regions 
(thick lines) and regions of ordered secondary structure (boxes) 
shown. (B) The region of study containing segments d and e is 
highlighted; residues *0 through 60 (dotted line) cover part of the 
second transmembrane region as well as residues 54 to 63, which 
were predicted to form a loop on the surface of the outer membrane. 
Identical residues in the corresponding region of OmpF are pnntcd 
in boldface letters. 



thesis of single-stranded DNA template for both mutagenesis 
and DNA sequencing. Bacteriophage X*, was used in 
lambda sensitivity assays by cross-streaking. 

Plasmid pAM1850 was constructed from pAM1520 (10) by 
introducing a Sad site covering the codon at residue 43 and 
a/firadlll site at residue 56; neither of these DNA changes 
modified the amino acid coding sequence. Strain pop6510 
harboring pAM1850 exhibited the same wild-type phenotype 
as with pAM1520 and was used as a positive control in all 
LamB function characterizations. pAMl854 encodes a stop 
codon (TAA) at residue 56, introduced by oligonucleotide- 
directed mutagenesis of pAMl850. pop6510 containing 
PAM1854 was used as the negative control in phenotypic 
characterizations. 

Media and genetic techniques, LB and DYT media were 
used as described before (6). Minimal medium A (MMA) was 
prepared as described elsewhere (19). Eosin-methylene blue 
medium (EMB) was also used (19) and contained 0.4% 
maltooligosaccharide (Pfanstiehl Chemicals, Waukegan, 111.) 
together with 0.04% eosin Y and 0.0065% methylene blue as 



and EMB* All plasmid preparations were performed by the 
boiling method (7). Transformation of plasmids into bacteria 
and all other genetic techniques were performed as de- 
scribed before (18). Helper phage-aided single-stranded 
DNA synthesis was performed as described elsewhere (30). 

Design and synthesis of degenerate oligonucleotides. Degen- 
erate oligonucleotides 62 bases long, corresponding to the 
wild-type sequence 5' CT TAT GCT GAG CTC AAA TTG 
GGT CAG GAA GTG TGG AAA GAG GGC GAT AAA 
AGC TTC TAT TTC 3 ' (residues 40 to 60 of mature LamB as 
encoded in pAMl850), were chemically synthesized (Ap- 
plied Biosystems). Each of the four nucleoside phosphora- 
midite substrates was contaminated with 2.33% of each of 
the other three nucleoside phosphoramiditcs. The propor- 
tion of oligonucleotides carrying substitutions was calcu- 
lated as described before (21). With 1% contamination (i.e., 
2.33% of each of the other three nucleoside phosphora- 
midites) in the synthesis of a 62-mer, we expected that 1-1% 
of the oligonucleotides obtained would have a wild-type 
sequence, 5.2% would have one base change, 12% would 
have two base changes, 17.9% would have three base 
changes, 20% would have four base changes, and 43.8% 
would have five or more base changes. The pool of synthe- 
sized oligonucleotides was purified by using an oligonucle- 
otide purification cartridge (Applied Biosystems) to separate 
incomplete oligonucleotides. 

Mutagenesis of residues 40 to 60. Plasmid pAM1850 was 
transformed into C1236for synthesis of the uracilated single- 
stranded DNA template. To obtain the Dex" template, 
mutagenesis of pAM185Q was performed with the degenerate 
oligonucleotides as described by Kunkel et al. (16), with dut 
ung selection, except that 30 ng of oligonucleotides and 1 u-g 
of template were used in the annealing step. The uracilated 
template of pAM1854, which has a Lys-56->TAA (stop) 
mutation, was used in the second round of mutagenesis, 
again with the degenerate oligonucleotide pool. The ligation 
mixture was transformed into pop6510 (Dex") and spread 
onto 0.4% maltodcxtrin-EMB indicator plates. Clones 
which appeared dark red were picked and purified on nutri- 
ent agar plates. 

Lambda and starch binding assays. Lambda sensitivity was 
assayed by cross-streaking isolated Dex + transformants 
against . t 

The sugar-binding site of maltoporin was assayed m two 
ways. For screening of transformants, the chemotaxis soft 
agar plate (14) was used. Each of the selected Dex 4 " isolates 
was tested for the ability of bacteria to bind to starch at a 
concentration of 2 mg/ml in 0-24% microbiological agar with 
0.002% ribose present as an attractant. The size of the 
chemotaxis ring formed after incubation at 30°C overnight 
was recorded. Ring formation is indicative of a starch- 
binding defect, as starch prevents the swimming of bacteria 
with a wild-type level of functional sugar-binding site in the 
outer membrane (14). A more quantitative measure of the 
starch-binding ability of the isolates was made with washed 
suspensions of bacteria applied to starch-Sepharose columns 
as previously described (8); the proportion of bacteria re- 
tained in these columns is dependent on maltoporin-binding 
activity, 

DNA sequence analysis. Double-stranded DNA sequencing 
was performed by standard Sequcnase reactions as de- 
scribed by the manufacturer (US Biochemicals Corp., 
Cleveland, Ohio) except that 1 ng of primer was used for 
annealing, which was done by incubation at 37*C for 20 mm. 
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transformants on indicator plates. Thirteen of 203 transfor- 

~monts-testcd- were uirable-to-utilize-maltodextrins-jUex 

phenotype). Three Dex" clones were sequenced and I were 
all found to contain a stop codon. One plasmid (pAMIlO*) 
altered at residue 56 (AAA— TAA) was used as the target 
template in subsequent experiments. t 

Mutagenesis of uracilatcd P AM1854 was performed with 
the same pool of degenerate oligonucleotides. Three possible 
outcomes could be predicted. Recovered transformants 
could have the same sequence as the template and be Dex ; 
these would be expected to constitute up to half of^ all 
transformants obtained with uracilated templates (16). Sec- 
ond, some transformants may have drastic changes in phe- 
notype because of incorporation of oligonucleotides, includ- 
ing those from the pool with multiple substitutions, and 
would be Dex~ if unable to form a small number of channels 
in the outer membrane. It should be stressed that most 
substitutions affecting phage lambda or sugar binding arc still 
Dex - ", as are those causing even a fivefold drop m LamB 
levels in the outer membrane (14). The rest of the transfor- 
mants would be Dex + , with at least some channel function 
but which could still be altered in phage or sugar binding. 
These transforraant sequences would all be derived from the 
incorporation of mutagenic oligonucleotides, and only a 
small proportion of these were expected to have the wild- 
type sequence, given the level of misincorporation into the 
mutagenic oligonucleotide. 

More than 1,500 isolates were obtained on rnaltodextnn- 
EMB-ampicillin plates, 290 of which, or approximately 20%, 
appeared dark red, indicating that maltodextrins were get- 
ting through the outer membrane. All 290 were found to be 
fully X sensitive, suggesting that LamB trimer in the right 
conformation for phage binding was made in all these Dex 
isolates. Dex" clones were also tested for X sensitivity in an 
attempt to screen for those with a correct surface conforma- 
tion but unable to utilize maltodextrins because of a drastic 
change in channel conformation. However, all 100 Dex 
clones cross-streaked were X resistant, indicating that no 
functional trirncr was present in the outer membrane of these 
Dex" isolates. These Dex" transformants were presumed to 
include a high proportion carrying the template stop codon at 
position 56, as well as some with mutations, including 
multiple mutations, rendering them unable to form func- 
tional protein. These isolates were not investigated further. 

The Dcx + clones were also tested for the functional state 
of their sugar-binding site; previous studies showed that 
sugar affinity defects can be present in mutants that form a 
channel permitting growth on dextrins (14). Starch binding 
by the sugar-binding site was assayed by the soft agar- 
chemotaxis plate method, in which bacteria with functional 
binding sites are rendered immobile, by interaction with 
starch (14). Of the 180 Dcx + clones tested, 23 were Bin", 3 
of which contained a stop codon (TAG) at residue 41, 48, and 
59, respectively; these contained some protein because of 
XhtsupE suppressor of the host strain pop6510, leading to a 
Dex"* - phenotype with Gin at these sites. The remaining 20 
Bin" isolates were further characterized as shown below. 
The other 157 isolates were phenotypically wild type by the 
• criteria of phage receptor function, sugar binding, and chan- 
nel formation for dextrins. 

The substitutions in 87 of the 157 Dex* Bio^ isolates were 
determined by DNA sequencing. The proportion of the Bin 
isolates having a wild-rypc DNA sequence in the region was 
. found to be 14.9%. Also, 40.2% of these isolates contained 
' one base change compared with the wild type (i.e., lamB 
sequence as in pAM1850), 29.9% contained two base 



COMBINATORIAL MUTAGENESIS OF LamB PROTEIN 



Substitution* in mutants 
YfteUKUCOEV UK 
~*n us 5f 



t 9 

i e 

V7 



2 8 
E 9 

e s 



St.. 

\l 

9 9 
9 S 



D I 
F 



i ; k «t 

; i i *• 

: 6 : v 

F : : 

I E 
; b oh 



o : r 





1 E 

M 9 

H 1 

*t 5 

H t 

4 T 
H 0 
1 9 

it. 

5 1 
S C 
33 



51 
9 8 



s e 
c a 

5 4 
f 9 

e « 

6 7 
C 0 

fi a 



: f 
i 



.j A. 

y f 



: o 

■ E 

c 

; l 



VRE'LKLGQE 




e c d k 

33 



FIG. 3. Functionally neutral amino acid substitutions found in 69 
isolates exhibiting a wild-type Dex 4 Bin + X s phenotype. Ammo 
acids are represented by the single-letter amino acid code. The top 
and bouom sequences are wild-type residues 41 through 60 of 
maltoporin. 



changes, 9.2% contained three, 3.4% contained four, and 
2.3% contained five base changes. No isolates containing 
more than five base changes were obtained. Six of the 35 
Bin* isolates which contained one base change were actu- 
ally wild type at the amino acid sequence level, and hence 
only 69 of the 87 Dex* Bin* isolates had amino acid 
replacements- When translated into amino acid changes, 
53% of the Bin + isolates contained one amino acid change, 
32% contained two amino acid changes, 13% contained three 
amino acid changes, and 2% contained four or more amino 
acid changes. Also, wc found little bias in the incorporation 
of nucleotides: there were 35 base changes to A, 48 base 
changes to T, 54 base changes to C, and 47 base changes to 

A summary of the functionally neutral amino acid substi- 
tutions (excluding silent substitutions within a codon) found 
in each of the 69 Dex + Bio 4 * isolates is given in Fig. 3. 
Phenotypically neutral substitutions were found at every 
residue except residue 40, at the extreme 5' end of the 
mutagenized region. Several of the substitutions occurred 
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FIG 4 Summary of functionally neutral and nonneutral amino 
acid substitutions recovered at each residue in the region containing 
residues 40 to 60 of maltoporin. the wild-type sequence is shown m 
the boxed middle line, (a) Above this line are amino acid subsutu- 
tions that have undetectable effects on LamB functions, (b) Shown 
below the wild-type residue* art substitutions recovered exclusively 
in the Dcx* Bin" isolates. Each type of substitution is shown once 
even if it occurred in several of the isolates shown m Fig. 3 or in 
different combinations with other substitutions. 



more than once and, in double mutants, were often found in 
random combinations with Other substitutions. For example, 
the Trp-51-^Arg substitution was found in six isolates, three 
times by itself (isolates 21, 32, and 57) and once each with 
one, two, or three other replacements (isolates 35, 40, and 
55, respectively; Fig. 3). It is therefore unlikely that these 
multiple substitutions are required to interact to give a Dex 
phenorypc, and we assume that each of the substituuons is 
functionally neutral. This conclusion is similar to that previ- 
ously obtained in a mutagenesis study of a well-character- 
ized protein (3). As a simplified summary, all possible 
neutral replacements for each residue, even when found m 
combination with other replacements, are listed m Fig. 4a. 

After sequencing 87 Dcx* Bin 4, isolates, it became appar- 
ent that the same substitutions were reappearing in the 
sequenced clones. The pattern of substitutions in Fig. 3 may 
have been biased by oligonucleotide hybridization prefer- 
ence, repair at particular codons, or other factors unrelated 
to the significance of particular amino acids to LamB func- 
tion. One way of testing whether the lack of neutral substi- 
tutions at residues 41 to 43 was a mutagenesis artifact was to 
sequence defective mutants as well. If Ihe lack of substitu- 
tions in Dex* Bin* mutants was a mutagenesis artifact, then 
there should also be few defective mutations in the 40 to 43 

SC ^en l ty Dex* Bin" clones were sequenced, and the 
substitutions shown in Fig. 5, including the combinations of 
multiple substitutions, were found. Most strikingly, the 
replacements uniquely present among Bin isolates were 
mostly at residues 40 to 43, although some of these were m 
random combination with substitutions already found in 
neutral Dex* Bin* isolates. As summarized in Fig. 4b, in 
which only the uniquely Bin' substitutions are noted, there 
is a satisfying inverse correlation with the pattern of the 
Bin* clones. Sites with few functional Dex* Bin* substitu- 
tions were most frequently substituted among the Bin 
isolates, indicating that the lack of neutral substitutions at 
residues 41 to 43 was not a mutagenesis artifact. 

It should be noted that the Dex* Bin" phenotype m the 
chemotaxis assay was previously found to be due either to a 
reduced amount of protein in the outer membrane or to 
normal amounts of protein but reduced sugar affinity at the 
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FIG. 5. Amino acid substitutions recovered in 20 isolates with a 
starch-binding defect (phenotypically Dex T Bin* X s ). The wild-type 
sequence of residues 40 to 60 of maltoporin is shown on the lop line, 
followed by the amino acid replacements of each Dex + Sin" isolate. 
Residue changes unique to Bin" mutants are boxed (i.e., if not found 
among the neutral substitutions in Fig. 3). 



sugar-binding site (14). To distinguish between these possi- 
bilities, the levels of LamB in the outer membrane of 
selected Bin* and Bin" isolates were estimated. Outer 
membrane was extracted from 18 nonidentical Bin" isolates, 
and the levels of LamB in the outer membrane were deter- 
mined by gel electrophoresis, as shown in Fig. 6- The results 
indicated that all of the Bin" isolates have some reduction m 
the level of LamB protein in the outer membrane, as 
summarized in Table 1, in which densitometrically scanned 
LamB levels were normalized to OmpF/C levels to compen- 
sate for possibly misleading loading differences in Fig. 6. 
Also in contrast to wild-type protein trirners, which are 
stable and hardly dissociate at 70°C, four (isolates 105, 107, 
113, and 115) of five isolates tested had LamB that dissoci- 
ated more readily at the lower temperature (Tabic 1). In 
particular, LamB from isolate 105 (Glu-43-*Lys) was com- 
pletely dissociated, suggesting that trimer stability was af- 
fected by Bin" mutations. H also appeared that these mutant 
LamB trirners are more prone to proteolysis (results not 
shown). Hence, residues 40 to 43 are important for malto- 
porin structure. 

To further analyze the effects of mutations on the sugar- 
binding site in Bin" isolates, starch binding was also assayed 
in starch-Scpharose columns. Ninety percent of the bactena 
with a wild-type LamB-expressing plasmid (pop6510 harbor- 
ing pAM1850) were retained in these columns, whereas only 
10% of the bacteria without LamB were retained. For the 
mutant isolates, the retention rate, expressed as the percent- 
age of bacteria retained in the column, was as summarized in 
Table 1. In confirmation of the chemotaxis experiments, all 




FIG. 6* LamB protein in the outer membrane analyzed by so- 
dium dodecyl sulfate-polyacryl amide gel electrophoresis (17), Track 
a shows the pattern of outer membrane proteins of populO con- 
taining pAMJfiSO; tracks b through s show Bin' isolates 101 through 
118, respectively. Purified protein standards were in track t. MBF, 
maltose-binding proiein. 
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1. Summary of protein levels and binding activity 
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ND 
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Y41S 


32 


32 


ND 
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Y41N, K45E 
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E43V 


ND 
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ND 
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A42V 


ND 


28 


ND 
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ND 



* Substitutions unique to those found in the Bin" isolates are tabulated; iHe 
complete set of substitutions in each isolate are shown in Fig. 5. 

" The gel in Fig. 6 w»$ scanned demitometrically, and the ratio of peak sizes 
tjf LamB relative to that of the combined OmpF/C bands was estimated for 
each isolate. The value given is the LamB/OmpFC pereenragc for i$oJate$ 
relative to that for the wild type. 

*' Starch hinding Of bacteria containing LamB-cxprcssing plasmids was 
estimated as described previously (14). 

4 Outer membrane extraois from ihe fcolaies *<re elcctrophorescd after 
healing, al 70'and 100*C. The stained LamB bands were scanned densitomei. 
ricilly, and (he proportion of the LnmB monomer at 70°C versus that at lOEPC 
was tiihubtfid. 

* ND, not determined. 



Bin" isolates had reduced starch-binding ability. This was 
also true for those mutants (isolates 108, 114, and 118) that 
had only slightly reduced levels of LamB in the outer 
membrane (Table 1). Hence, some substitutions may influ- 
ence the conformation of the sugar-binding site as well as 
change protein levels. 

DISCUSSION 

The strategy used in this study has a number of advantages 
over current combinatorial cassette mutagenesis methods. 
By forcing the mutagenic oligonucleotides to replace a 
nonsense codon in the template, it is possible to ensure that 
all functional or partly functional sequences result from the 
oligonucleotides, regardless of the rate of mutagenesis. This 
is simpler than the cassette approach (3), which requires 
unique Ranking restriction sites for the efficient cloning-in of 
targe numbers of mutagenic sequences. The use of a uraci- 
latcd template also reduced the frequency of template se- 
quences among our transformants, and our 20% rate of 
recovery of Dcx + transformants is reasonably satisfactory. 
Among the Dcx**" clones sequenced, there was a preferential 
incorporation of oligonucleotides carrying three or fewer 
mismatches. This is itself an advantage in that the Dex + 
isolates were limited in the number of substitutions they 



carried. However, a limitation of this method, common to all 
"mutagenesis mcthods"for"extendedsequenccs7wasthat"total-- 
randomization of each codon in the region was not feasible. 
Hence, the spectrum of substitutions observed was generally 
biased in favor of codons differing by a single base from the 
wild-type codon. Nevertheless, we recovered seven mutants 
carrying two base changes in the same codon. and thus by 
sequencing a large number of mutants, a wide spectrum of 
functionally acceptable and defect-causing mutations, en- 
coding a range of phenotypes, could be obtained. 

Two approaches were used in the analysis of mutants. 
First, functionally neutral substitutions at each residue in the 
TCgion were identified to permit identification of unimportant 
residues. The second approach involved identifying sites of 
mutations present in isolates with a LamB defect. The 
results obtained with these two approaches showed a good 
negative correlation; sites with few permitted neutral substi- 
tutions were more frequently sites of deleterious substitu- 
tions and vice versa. Hence, we have confidence in the 
conclusions on the relative informational importance of 
residues in the 40 to 60 region of maltoporin. We conclude 
that residues 40 to 60 of LamB can be divided into four 
stretches of distinct functional properties. 

Residues 40 to 43, Residues 40 to 43 show homology to 
several outer membrane proteins (22). It is significant that 
most mutants with a starch-binding defect and reduced 
levels of protein in the outer membrane had substitutions at 
residues 41 to 43. No functional substitutions were recov- 
ered at residue 40, but* this is probably because only the 
second and third bases of the codon were present at the 
extreme 5' end of the mutagenic oligonucleotide. However, 
the recovery of a Bin" isolate carrying a Thr-40-^Ilc muta- 
tion suggests that this residue may also be significant in 
maltoporin function. The results of starch-binding assays 
and protein level assays (Fig. 6 and Table 1) with the Bio ~ 
isolates indicated that mutations at residues 40 to 43 led to a 
severe defect in both the level and stability of maltoporin in 
the outer membrane, suggesting that these residues, and 
residues in the corresponding region in other outer mem- 
brane proteins, play an important role in the localization and 
assembly processes. A few amino acid substitutions influ- 
enced starch binding without greatly reducing protein levels 
in the outer membrane. Given the structural changes elicited 
by other substitutions in the 40 to 43 region, it is likely that 
the sugar-binding defect is an indirect conformational effect 
on the binding site rather than a change directly at the 
binding site itself. 

Since the completion of this study, the position of residues 
in OmpF corresponding to residues 41 to 43 of LamB has 
been determined by X-ray crystallography (5). These resi- 
dues were in a transmembrane region deep within the 
monomer-monomer interaction site of OmpF- The pheno- 
type of LamB mutants within this region is highly consistent 
with such a position in LamB as well. 

Residues 55 to 57. In contrast to residues 40 to 43, residues 
55 to 57 accept a diverse range of substitutions (Fig. 4a), 
suggesting that these residues are unlikely to be of functional 
importance. The flexibility demonstrated at these residues is 
in good agreement with the proposed secondary-structure 
model (Fig. 1) (4), which suggested that these residues 
constitute a membrane-external loop. It has previously been 
demonstrated that a Ser-57->Q-s mutant had phenotypes no 
different from those of the wild type (12), which is also 
consistent with the tolerance of these residues in accepting 
neutral substitutions. However, labeling studies on the mu- 
tant carrying the $er-57-+Cys mutation showed that the thiol 
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feridnTsTto 60. The region of the highest fleshy 
freridues 55 to 57) does not extend to rescues 58 to 60 
Eh were a so postulated to be in the loop. A mutant with 

fsuStion o^cSe-Asp ?^„7* 7,^5: 
neutral substitutions at residues 46 and 55) had a starcn- 
K ng Ect though at this stage we have no data on the 
S If the Phe-58-Asp mutation in isolation, the rela- 
dvety ^Saldnature o? residues 58 to 60 "0*- J" 
thes* residues are not part of the finable loop ™ e ™£ 
tional data are more consistent with these residues already 
being in the next ordered segment (f, t-\g. l). 

Residues 44 to S4. Residues 44 to 54, w,th the exception of 
rt <idu« 44 49 52, and 54, all accept a limited range of 
S^diU&^tkmB (Fig- 3). At this stage, i, is difficult to 
cone ude whether this stretch of amino acid residues .s part 
of a transmembrane region, as in Fig. 1, or a region such ,S 
one of the short a-heliccs found in ?< ™J±* a ™r 

porin (5 27). Alternatively, residues 44 to 54 toge her with 
residues 55 to 57 could constitute a larger external loop. A 
larger loop would require a shorter preceding transmem- 
brane sclent, and given that most transmembrane p-struc- 
tured segments in OmpF or Rhodobadfr ponn arc less than 
17 residues in length (5. 27), the originally proposed , 21 
residues in segment d (Fife- D may have been overestimated^ 
Nevertheless, the nature of some residues between 44 and 
54 appears to be critical- At residue 44, only large and 
hydrophobic amino acids were recovered as neutral subsn- 
tutions (Fig. 3). Also, Glu-49-Val resu ted in a starch- 
binding defect (Fig- 5). Given that this residue accepts only 
relatively conservative substitutions and that the same resi- 
due t found in OmpF (Fig. 1). it appears that Glu49 plays a 
role in the stability and functioning of LamB in the outer 
membrane. Similarly, residues 52 and 54 also appear to .have 
a restricted range of functional substitutions. In particular, 
tvs-52 accepts only Arg, which is of the same charge, and 
lie, which is identical to the correspond.^ residue inOmpF. 
Also a Lys-52-*Glu substitution, along with two other 
substitutions at residues 41 and 56, resuhed in a B.n 
plSorype (isolate 110, Fig. 5). although the effect of the 
Ly S -52-k3lu substitution in isolation is not known at this 

St sfncc both residues 49 and 52 are charged residues, further 
investigations arc necessary to elucidate whether the.r po- 
tential roles in LamB structure and function are charge 
related. It is important to note that, as shown in Fig. 1. 
segment d is highly polar Tor a potential transmembrane 
Sent and also shows high amphipathlc.ry as , , frjt rue- 
ture, as is the corresponding region of OmpF (5 . Subsmu 
tions changing the nature of the charge were not tolerated at 
all except in the loop at residues 53 to 56, as were other 
nonconservative substitutions. Charge changes at residues 
45 49 and 52 were associated with a protein structural 
defect.' These results arc consistent with the likely impor- 
tancc of the amphipathic nature of this segment in LamB 
structure, stability, and outer membrane assembly. 

In conclusion, the results presented m this study have 
proved useful in the identification of important residues 
within a region for which little structural and functional 
^formation was available. The spectrum of neutral and 
defective substitutions has provided an internally consistent 
oattern of the particular significance of residues 40 to 43 in 
the structure and function of LamB and poss.bly in the 



corresponding regions Of Other outer membrane proteins. It 
is also worth emphasizing that the study of functionally 
acceptable substitutions provides a means of testing pro- 
posed models of secondary structure. 
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SUMMARY 

A library of mutant ecoRJR genes encoding EcoR\ restriction cndonuclease was generated using trinucleotide blocks and 
;i combination of recombinant DNA procedures, including primer extension and the polymerase chain reaction. Codons 
corresponding to three amino acids (E l4 *\ R us and R 200 ), previously implicated in the specific recognition of the DNA 
substrate, were combinatorial^ mutated so as to generate a library that potentially contains all 20 3 possible single, double 
and triple aa replacements, in a balanced distribution. Inspection of the phenotypes of Escherichia coti colonics bearing the 
mutant genes showed that several of them retained activities that were deleterious to the cells but were still protected by the 
FcoKl mcthyltransferasc. These included new en2ymc variants, including non-conservative single (Thr or Val for Glu m ) 
and double {Val for Glu 144 and Thr for Arg 145 ) replacements. 



INTRODUCTION 

In the study of the structure-function relationship of 
proteins, as well as for the generation of proteins with novel 
properties, methods for the site-directed mutagenesis of the 
corresponding genes have become an indispensable tool. 
There are several stages, conceptual and methodological, to 
be taken into account when designing a rational muta- 



Corrapondenceio: Dr. X. Soberon, ClIGB/UN AtA, Apdo. Postal 510-3, 
Col. Miraval 62271, Cuernavaca, Morelos (Mexico) 
Tel. (52-73) 172-399; Fax (,52-73) 172-388. 

Abbreviations: aa. amino acidts); bp. base pair(s); ccoRIR. gene encoding 
EcoRI; ecoRIM* gene encoding M-£coRI; ENaie, restriction cndonu- 
clease: EldBr. ethiumra bromide; JPTG, isopropyl-^n-thiofialacto- 
pyranoside: kb, 1000 bp; Km, kanamycin; LB, Luria-Bcrtani (medium); 
M ■ EcoVLU EcqKI MTase; MTasc, mcthyluansfcrase; nt, nucleotidc(s); 
oligo,oligo4eox>Tibonucleotide; PCR. polymerase chain reaction; Pollk, 
WenOW (large) fragment Of £. coti DNA polymerase t; R , resistance; as, 
sing]e<strand(ed); wt T wild type. 



genesis strategy: first, even when there is a known three- 
dimensional structure for the protein, it is extremely difficult 
to predict which changes are necessary to achieve the 
desired property of the protein. This type of prediction is 
still harder in the event that more than one residue change 
is needed. Second, frequently the residues one wishes to 
alter are far away from each other in the primary sequence 
(although they may be close together in the active confor- 
mation of the protein), and the combinatorial alteration of 
such residues is highly desirable (Wells, 1990). Third, as the 
number of target residues increases, the use of codons as 
mutagenic units becomes important in order to obtain a 
large number of useful alleles within a manageable mutant 
population (Sirotkin, 1986). We propose here a method for 
combinatorial, saturation mutagenesis, with codons as 
mutagenic units. 

In order to test a mutagenesis method incorporating the 
considerations mentioned above, we chose the restriction 
endonuclease EcoRI as a model protein. The first report on 
the structure of the protcin-DN A complex, to 3 A resolution 
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(McClarin et al., 1986), led to the proposition that three aa 
(E l44 » R 145 and R 200 ) directly contact the purine bases at 
the major groove of the substrate DNA molecule. A recent 
"revisionof th^stmcture (Kim et-al7,4990) reveals additiona}- 
regions of the protein that contact the pyridine bases, 
also at the major groove, in a new model which involves 
differences in chain connectivity. The lethal activity of the 
enzyme in vivo, together with the existence of a corre- 
sponding MTase, which inhibits DNA cleavage by the 
ENase, has permitted the design of sufficiently sensitive in 
vivo assays for the detection of mutant ENases wkh either 
normal or altered activity and/or specificity for the canoni- 
cal site (Hcitman and Model, 1990; Oelgeschlager etal., 
1990). 

In this paper, we report the generation of a library of 
DNA fragments, potentially containing all of the 8000 pos- 
sible aa combinations at three positions that face the DNA 
major groove (E 144 , R' 45 and R 200 ). We used trinucleotidt 
blocks in the oligo synthesis and obtained a fragment library 
which resulted in the expected distribution of replacements 
(single, double and triple), with no obvious bias and no stop 
codons. Using this library and a colony phenoiype assay, 
we have isolated new partially active mutants, including a 
double aa replacement, that further strengthen the notion of 
the importance of DNA conformation (Lesser et al., 1990), 
and/or additional contacts at the bases (HciLman and 
Model, 1990), on the recognition process. 



RESULTS AND DISCUSSION 

(a) Combinatorial mutagenesis of aa 144, 145 and 200 

In order to have access to all 20 J possible combinations 
of aa at these three positions, we designed an experiment in 
which we would introduce, via synthetic DNA. a fragment 
consisting of 50% wt codon, 50% of an equimolar mixture 
of codons for the 20 aa, at each of the three positions (see 
fig. 1 legend). In the ideal case (e.g., with equal coupling 
efficiencies of the trinucleotides and no other artifacts 
during chemical and enzymatic manipulations), a binomial 
distribution would predict the composition of the fragment 
to be 12.5% wt in all three positions, 37.5 % single, 37.5% 
double and 12.5% triple replacements. The number of 
clones sequenced so far (as shown in Table 1) is still too low 
to tell how close the actual distribution of changes is, com- 
pared to the ideal. Nonetheless, our results do show that we 
have a collection with a variety of replacements, in all three 
positions, which has already provided mutants with inter- 
esting phenotypes (see section b). 

This mutagenesis scheme goes beyond previous reports 
on combinatorial mutagenesis (Reidhaar-Olson and Sauer, 

1988; Dunn et al., 1988; Sartorius et al- 1989). First, we 
have targeted residues that are far away from each other in 



the sequence. Second, through the use of trinucleotides we 
have aimed at a non-biased collection of replacements, 

TABLE I 



Variant enzyme phenotypes 



EcoR\ mutants" 



Plating phcnotype tf 
without M'£c£>RI d ' 



-IPTG 



+ IPTG 



Wl 





— 


— 


4- 




E144C 


C 




— 


+ + 


- 


E144T 


T 







+ 4* + + + 


• 4 


E144V 


V 


— 




+ + + + + 


4 


E14dG 


G 






+ ++ + + 


4 4 4 +4 


E144F 


F 






4 + + 4 + 


+ + 4 4 4 


E144W 


W 






+ ++*■*■ 


4+4+4 


RUSK 








+ + 4 4 4 


4 4 4 


R20OC 






C 


+ 4-4 


+ 


R200K 






K 


4 + 4 + + 


+ 44 4 + 


R200N 






N 


+ + + 4 + 


+ + 4- 4- 4 


R200H 






H 


4 + + 4 + 


4- + + + 4 


R200Q tf 






Q 


ND 


ND 


ER-VT' 


V 


T 




+ + + + + 


+ + 


ER-KY 


K. 


Y 




+ + + 4 4- 


+ 44+4 


ER-PF 


P 


F 




+ 4 4- + + 


+ + 4 + + 


ER-YW" (ASN 149} 


Y 


W 




ND 


ND 


ER-DE* 


D 




E 


.ND 


ND 


ERR-VTG 


V 


T 


G 


4- + 4- 4- 4 


4- +4- + 4 


ERR-1KL (LEU 203) 


1 


K 




+ 4- 4- 4- 4* 


4 4 4 4 + 


ERR-OIH*' 


D 


1 


H 


ND 


ND 


ERR.RHT 


R 


H 


T 




4 4 4 4 4 



11 Designations for variant ENufie indicate the original aa at the left and 
the aa present in the mutant* at the right, separated by the aa number 
where the replacement occurred. Numbers Tor multiple replacement* are 
umittcd Tor simplicity, they can be derived from columns 2-4. Numbers 
in parentheses denote mutations present outside" the Intended muta- 
genesis window. 

h Sequencing was performed on the entirety «r AM-tftotflH fragments, 
derived from mutant colonics, cloned on M13mp!9. The same fragments 
were also rccloned in ^fl.tf/m/IU. digested pKCS to verify the observed 
phenotypes (shown in Fig. 2). 

* JMiOi colony phenotypes were assessed visually. Colonics were grown 
on Km plates at a final concentration of 50 /ig/ml. Where indicated, plates 
were suplcmcnted with IPTG to a final concentration of X n»M. 
+ + t + +, normal appearing colony; 4 + + + .lower density colonies 
but similar size; 4 4 + and + + . reduced size and visibly translucent 
colonies; + , flat, small and translucent colonies (similar appearance as 
the parent enzyme), - , no growth. ND. not determined. 
d When the mutant genes were expressed in a MTase context, 3ll 
colonics had a normal phenoiype. including colonics bearing the parent 
aoRIR gene. Under induction conditions only the parent gene U lethal 
(see Fig. 2). 

tf Data obtained from direct cloning of the BgtU-Pstl fragment in Ml 3 
vectors. 

r In this case, the ecoRIM gene was complemented both in irons* in a 
compatible plasmid. and in cis. with the ecoRlM gene cloned in the same 
plasmid as the gene. In both cases, there was a protective effect 

or the MTase on the lethality of the mutant ENasc under induction 
conditions. The rescue was more evident in the cells containing the 
WRIM gene present in trans (see Ft&. 2). 
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Fin I. Combinatorial mutagenesis strategy. The short, solid lines indicate synthetic oligos used as mutagenic primers or as ligation adapters (thickening* 
denote mutagenic regions), Hatched double lines denote the Ml3mpl8ss DNA template. Dashed lines depict the extension products. Methods: All clonmg 

l»7y)as recipient strain. Plasmid pKGS. which carries a full length tcoUR gene under the control of the inducible promoter lacUV5. and pSClOlmetn 
a Tc» derivative of pSCiOI. containing the ecuUU gene (Bctlach ct al., 1976). were a generous gift from PJ. Greene (UCSF* OHfios were synthesized 
o« u DNA synthcizcr (Systec 1460A) using the phosphoroamidite method (Beaucagc and Caruthers, I98l)and controlled pore glass as roppon(A4»mi 
el al 1983) In some Steps, those Involving the introduction of a cocktail of trinucleotides, a mixture was prepared containing equ.molar amounts of 
in different fullv protected, phosphotriester irim.cleotides,.*ymhcsized as described (Broka etnl.. 1980). Each trinucleotide corresponded to a codon 
I* one aa, as follows: Ala. OCT; Arg. CGT: Asn. AAC: Asp. GAC: Cys. TGT; Olu. GAA; 01... CAO; Gly. GGT: His. CAC: lie. ATC: Leu. CTG: 
Lvs AAA- Met ATG- Pile TTT;Pro.CCG;Ser.TCT:Thr.ACT:Trp.TGG;^^ 

for replacement's, the "growing oligo was reacted, using the procedure of Ito et al. (1982). through a manual injection valve, with • solution made with 
p total of 20 umol trinucleotides from the cocktail described above and 20 ftmol or the trinucleotide corresponding to the aa present m the wt £coRI 
sequence. Purification of oligos was performed by gel electrophoresis in MM polyacrylamide-8 M urea gels. Large pieces were sliced from the gel to allow 
for heterogeneity of migration of the mixed oligos. For the creation of convenient restriction sites, ollgo-directed mutagenesis was performed essentially 
as described (Su and El-Gewely. 1988). M 13mpl9HP£CO. an M I3mpl9 clone that carries the fflndlll-ftll insert, which corresponds to the fragment 
aiding for aa 68 to 205 of the ENase was employed as a single stranded template. We simultaneously added all three mutagenic pnmers 
5--GCCATTAGATCTTGATCTCCiC, 5'-GACCCCTCTACAAAAAGGACGT. 5'-TTAACAACCCCTCCATCTGGTC, plus the M13 'universal 1 
primer. These oligos introduced silent mutations creating Bgllh Xba\ and Mhtl sites, at positions corresponding to aa 135. 169 and 187. respectively. 
A clone scoring positive for all three mutations was chosen and. after sequence verification, used as a source of DNA to return the HinW\-Pttl fragment, 
bearing the mutations, to the KORIR gene Of pKGS plasmid, substituting for the original region. Combinatorial, site saturation mutagenesis at aa positions 
144 145 and 200 of EcoM endonuclease was approached by the following strategy: Ml3mpI8HPECO. an M13mpl8 clone containing the iftidlll-JM 
inse'rt of the tcoRJR gene was used to infect E. coli strain IM101. Ttie ss DNA was purified and used as template for primer ««MSton * e *»° 
mutagenic oligos (5'^TCAAGATcrAATGGCTGCTGGTAATGCTATCXXXXXXTCTCATAAGA, 5--CGCGTTGTTAATCTTGAGTATAAT. 
TCTGGTATATTaAATXXXTTAGATCGAC. where the X denote positions mutagenbed in the manner described above) in separate experiments. The 
mutagenic oligos were phosphorylated by T4 polynucleotide kinase using [ y-*P]ATP. The radioactive oligos were annealed to the SS DNA and treated 
with Pollk for 1 h at 37"C After inactivating the polymerase by healing at 65"C for 10 min. the DNA was digested with Mk\ or £a,Rl depending on 
Um ollgo used in the extension. Ttu= » DNA fr» 8 m e «t produced was purified by 20% poIyacryliimide-8 M urea g=l aft™- vuiateat™ of the band by 
autoradiographic exposure. The ss DNA fragments of 161 and 99 nt (corresponding to the expected lengths of the fragments generated with the oligos 
complementary to the regions or aa 144-145 and 200. respectively) were gel purified. The fragments (appro*. 0.03 pmol each) were then hgated to one 
another, aided by 0.3 pmol of an adaptor oligo (5 ' -TTAACAACGCGTCCATCTGGTC) in a 20 ul reaction. Hit expected 260 nt ss DNA product was 
gel purified and subsequently amplified by PCR. We used S'.ATCAAGATCTAATGGCTGCTGGTAA and J'-CTAOAGTCOACCTGCAGTTA as 
primers, and performed 30 cycles at 95. 55 and 65 e C. for I.S. 1.5 and 3 rain, respectively. The amplified fragment was ligated to adapters (due to the 
unexpected difficulty to digest with BgM at the site located 4 bp from the end orthe fragment) and then digested with BgfO. + ftil. The purified fragment 
was cloned either in pKGS, replacing the wt with the mutant region as a cassette, or in M13mpl9. for direct sequencing. 
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containing a suitable distribution of triple, doubleand single 
replacements, with no stop codons; with methods based on 
synthesis with mononucleotides, the resultant mutant libra- 
^Wd-bT^cted-to-be-poorerrdue-to-irmerent-lmu-- 
tations imposed by the degeneracy of the genetic code as 
well as the fact that the triplet is the coding unit. As the 
advantage of using triplets for mutagenesis, although signifi- 
* cant (Sirotkin, 1986). is probably offset by the general diffi- 
culty of producing and using such synthetic units, we are 
currently working at further advancing these methods, 
adapted to more widely available synthetic chemistries. Oar 
efforts arc also stimulated by the versatility attainable by the 
use of PGR. 

(b) Phenotypes of bacteria bearing the mutant ecoRIR 
genes 

The transformation mixture of the mutant fragment 
cloned in pKGS was plated in conditions in which the gene 
is repressed (no IPTG added). The ecoRJR gene present m 
this plasmid carries a mutation that results in a protein 
product, altered at one aa (E 1 (, ° - D) (Kuhn ct al.. 1986). 
and with lower activity under repressing conditions (note 
that all of the mutants reported in the present study contain 
this replacement as well). The presence of this variant of the 
ecoRIR gene, under the repressed conditions, us tolerated 
but confers a translucent, flat appearance to the E. col, 
JM101 colony bearing it. With the mutant collection, we 
observed colonics with at least three distinctly different 
phenotypes: normal, intermediate, and wholly flat-trans- 
lucent (Fig 2). Furthermore, when these colonies were re- 
plica-plated to media containing the inducer, we could 



again observe differences among them, namely: normal 
growth, poor growth and no growth (see Fig- 2 and Table 
I). To verify the correspondence between the sequence and 

the nMrrvH ph^type, plas mid DNA was purified and 

used both for retransformation and as a source for the 
sequencing. 

Out of the limited number of colonies analyzed so far 
(about lO J ,frorn several transformation experiments), some 
preliminary observations could be noted. Some 10% of the 
colonies displayed phenotypes indistinguishable from those 
carrying the parent gene, under repressing conditions. Se- 
quencing experiments from several of such colonics re- 
vealed the presence of the expected silent mutations 
(coming from codons present in the trinucleotide mixture 
used to synthesize the oligo, which are synonymous, but 
different from the original). 

Less than 1% of the colonies showed the intermediate 
phenotype; sequencing of several such clones revealed only 
the Cys 144 and Cys 200 variants. From a few hundred 
colonies that were replica plated to inducing conditions, 
about 2% manifested a distinguishably poorer growth 
( E i4-> -v, E ,4 «-.T, R US -->K, and £ U4 R ,4J -> VT were 
identified this way). Overall, therefore, there is an indication 
that less than 5% of the variant Eco RI altered in one or 
more of these three residues confers a'phenotypeto the cells 
carrying them (as detected in our. assay system). 

Sensitivity to the presence of the M EcoKl was tested 
for some of the mutants either by transforming the corre- 
sponding plasmid DNA into JM101 cells bearing the 
ecoRlM im in a compatible plasmid or by replacing the 
mutant fragment in a plasmid that contains both the ecoRIR 
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and the ecoRJM genes. All mutants tested showed a 
TTTlTtfse"-sen'riuve-^ — 
(Fig. 2). It was striking to observe variants with remanent 
' activity that are still MTase-sensitive, especially those 
containing entirely non-conservative (e.g., £ U4 V) as well 
as double (£ l44 -+ V + R 145 - T) aa replacements. 

Our results are in general agreement with recent reports 
(Heitman and Model, 1990; Needels et al., 1989; Alves 
el ah, 1989) showing that some replacements at aa 144, 145 
and 200 of EcoW generate ENases with reduced activities, 
but are still specific towards the GAATTC sequence. 

The use of combinatorial mutagenesis* together with our 
■ assay system and, possibly, the utilization of the mutant at 
position 160, permitted the identification of the double re- 
placement variant that retains activity against the canonical 
site. It is noteworthy that the phenotypes conferred by some 
of the mutations we isolated differ from those previously 
reported. For instance, mutants E 14 * — T and E l4 *->V 

• have detectable in vivo activities in our hands, whereas 
| Heitman and Model (1990) listed them as inactive. On the 
j other hand, wc observed a null phenotype for E ,44 -+ G, 

• listed as marginally active by the same authors. Since our 
detection system is different and uses a difFercnt strain from 
those previously reported, it is possible that mutations that 

j result in mechanistically different activities (e.g.* nicking, 
dissociation prior to hydrolysis of the second strand, etc.) 
stand out in one assay, but not in the other. We arc currently 
workingon the isolation of the mutations at aa 144, 145 and 
200 from the lesion at position 160, which is also a con- 
ceivable cause of the discrepancies noted above. 

Due lo the amendments on the crystal structure of 
EcoRU there has been caution in interpreting recent studies 
otEcoKL in structural terms (Lesser et al., 1990). Based on 
the description from the article reporting the revised, re- 
fined, coordinate set (Kim et al, 1990), as well as on our 
own observations using the available a-carbon coordinates 
(Brookhaven PDB entry 1R1E), wc infer that the side 
chains of aa 145 and 200 are within hydrogen bonding 
distance of the bases at the major groove of the DNA 
molecule, with aa 144 also nearby. Therefore, although our 
results substantiate the notion that additional contacts are 
involved in the determination of specificity, they clearly 
pertain to the protehvDNA interface. 

' (c) Conclusions 

In summary, we have implemented a method to combina- 
tortally mutagenizc three residues implicated in the specific 
recognition of DNA by endonuclease Eco RI (McClarin 
etal., 1986). Our collection is composed of a balanced 
proportion of single, double and triple replacements and 
allowed us to isolate several mutants that retain activity. 
The observation of an increasing number of altered proteins 
with replacements at these three residues, that still retain 



activity and specificity, reinforces the experimental support 
^f-ihc-notion-that-other-aa-play^-significantjole„in„the_ 
recognition and cleavage processes. Work under way is 
aimed at the screening of a sufficiently large number of 
clones in order to score the phenotypes of other multiple 
replacements as well as extending the mutagenesis window. 
We believe that our approach, applied to all residues 
involved in hydrogen bonding interactions, as interpreted 
from the new structure, should be valuable in providing 
information with regard to the possible overdetcrmination 
of the specificity of this enzyme. 
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Plasminogen activator inhibitor (PAI-1) rapidly in- 
activates tissue plasminogen activator (t-PA) and uro- 
kinase (UK) with nearly identical association rate con- 
stants. The contributions of Ser a <\ Ala § and Arg* 6 
(P 3 , p a> and P x residues, respectively) in PAI-1 to in- 
hibition of UK and t-PA were evaluated using combi- 
natorial mutagenesis of the human PAI-1 cDN A. A 
bacteriophage X expression library potentially encod- 
ing the fiOOO unique PAI-1 species were screened for 
inhibitory activity against UK using a fibrin indicator 
gel. 390 plaques demarcated by zones of retarded fi- 
brinolysis were analyzed to determine the DNA se- 
quences of their associated active PAI-1 species. We 
found 134 unique PAI-1 variants that retained inhib- 
itory activity towards UK; they contained a variety of 
amino acids in their P* and Pa positions but only Arg 
or, infrequently, Lys in their Pi position. Each of the 
unique active PAI- 1 were assayed for inhibitory activ- 
ity towards UK or t-PA; many substitutions differen- 
tially affected the ability of the inhibitor to inactivate 
UK and t-PA. For example, replacement of Ser and 
Ala"* with Val and Pro, respectively, yielded a PAM 
variant exhibiting an association rate constant that 
vras unchanged for t-PA but decreased 23-fold for UK, 
relative to native PAI-1. In general, the PAM variants 
were more potent inhibitors of t-PA than UK. Hence, 
t-PA appears more tolerant than UK of structural di- 
versity present in the P 3 and Pa positions of the PAI- 1 
variants. 



The amino acid sequence of human PAI-1, 1 aa deduced 
from its cDNA sequence (1-3), reveals that PAI-1 is a member 
of the serpin class of serine proteinase inhibitors (4, 5). 
Serpins form a complex with their target proteinase that is 
presumed to involve an. intermolecular covalent bond. For- 
mation of this putative covalent linkage would result in pro- 
teolytic cleavage of the inhibitor. NHi-terminal sequencing of 
the COOH-terrn'inal proteolytic fragment of PAM generated 
from its interaction with t-PA confirms that the Pi and Pi' 
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positions 4 of the reactive site of PAI-1 are occupied by Are;™ 
and Met 347 , respectively (6). 

The association of proteinase inhibitors with their target 
proteinases typically involves amino acid residues in the in- 
hibitor in addition to those situated at the reactive site, P, 
and Pj' (7, 8). These secondary interactions contribute to the 
exquisite specificity and efficacy which is exhibited by many 
proteinase inhibitors. Amino acid residues in the inhibitor 
that are contiguous to the reactive site frequently occupy 
subsites in the extended active site region of the proteinase 
and presumably are key determinants of the interaction 
Studies using amidolytic BUb&trates suggest that the extended 
active eite regions of t-PA and UK contain S a and S,> subsites 
(9). We hypothesized that the P a and P a residues of PAM 
occupy these subsites and thereby contribute to the stability 
of the inhibitor-proteinase complex. 

The random and simultaneous substitution of amino acids 
in multiple positions Of a protein sequence by combinatorial 
cesette mutagenesis has proven to be a powerful strategy for 
generating a large number of variant proteins that can sub- 
sequently be monitoried for consequences on activity (10-12). 
We have applied this approach to PAM and used it to 
evaluate the contribution of the P., through Pi residues of 
PAI-X to inhibitory activity. Knowledge of the ammo acid 
substitutions in PAI-1 that are tolerated without abolishing 
inhibitory activity provides insight into the specificity re- 
straints imposed by the active site regions of these plasmin- 
ogen activators. 

EXPERIMENTAL PROCEDURES 

Materials-liuwn plasminogen (Glu-type). D ?? f ** S *?^S? 
PI, PAI-1 monoclonal antibody 379. and ImmunoBmd PAI-1 bU&A 
kits were from American Diagnostic* {Greenwich, CT).^ombmant 
t-PA, predominantly single-chain. waa from Genentech (South San 
Francisco, CA); it was converted to two-chain t-PA by treatment with 
plasmin-Sepharoac as described elsewhere (13). Hainan thiomta 
was from Sigma. Bovine fibrinogen waa from CalBiochem (LeJol a, 
CA). Restriction enzymes were from New England Biolaba (Beverly. 
MA). T4 DNA lipase was from Boehxmger ^nheim. Native Taq 
polymerase was from Perkin-Ellmer Cetus (Ncrwalk CT). Umbda 
Zap II, Escherichia coti strain XL-1 Blue, and GigePack Gold _ww 
from Stratagene (LaJclla. CA). Matal chela^Scpharoae and NAP- 
10 gel Filtration columns were from Pharmacia LKB Biotechnology 
Inc. [a-»$]dATP was from Ameraham Corp. Centnprep-30 devices 
were from Amicon (Danvers, MA)* Sequence > Version £0 sequencing 
kits were from U. S. Biochemical (Cleveland, OH). Pyr-Gly-Arg-4- 
methylcoumaryM-arnide and MeoSuc-Ue-Gly-Ar|-7-amido-4-meth- 
ylcoumarin were from Peninsula laboratories (Belmont. C A) and 
Enzyme Systems Product* (Livermore, CA), wspecuvely. The chro- 
mojjenic substrate S-2288 was from Kabi-Vrtrum (Stockholm. Swe- 
den). Oligonucleotides were syntheaized with an Applied Biosyutemfl 

3 The terminology for the amino acid residues of the inhibitor in 
the vicinity of the reactive site (P„ and 1 IV) and the wmplen^ntary 
aubsites of the plasminogen activators (So and S„ ) are adapted from 
' SchechterandBerger(25), 
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- ;« fko P* P, and Pi positions (amino acids 344, ana 

and PitI released a fragment encoding His 1 through Leu . t™ 
fralnent was £ined " a synthetic tryptophan promoter/opeiator 
X tanked by 5'-&uKI and 3/-N del sites ^f^.^iu 
Vlawk) using an oligonucleotide linker with 5 -Ndel and 3 -ApoU 
^ that ontribute the codons for the initiation Met and ™' of the 
PM-1 amino acid sequence. DNA sequencing using the dideoxy ehmn 
f . T'^ji is) was nerfonned to verify the integrity of the 

!Ud to the 203-bp mutant ijm** '"^l™Sof casleS 



ration in LB containing ampicillin- Cells were ^5T v ^^?i^I'?fv^^ 
mSOralofOmMacdiumphosphate. ISOmMNaapH 7-2<P»>whate- 

huffered saline). -and broken by W^ a w Sr 
disrupter (Stansted Fluid Power, Staneted UmW l^dom). The 
cell homogenate was clarified by centnfugation at lOO.OOO X * for 40 
min and L supernatant fraction was diluted with an e*jal vohme 
xf.a).mM.Hep*V-0iMmCJ.^^?^ 
supentttantwaaappliedtoametalche^^^ 
wHh ZnCl, and equilibrated with 20 mM Hepes. 0.25 M NaCl, . 0 )M 
Tween 50 P H 7.5. The column waa washed exhaustively with the 
equilibration buffer and eluted with a 0-0.2 M imidazole gradient 
PAI-l-containing fractions were identified by imrnunoblotting with a 
PAI-l-speciftc monoclonal antibody, pooled, and concentrated using 
a Centriprcp-30 device- The recovered PAI-l spaces were typically 
neater than 25% pure as judged by comparison of tha values deter- 
mined with the PAM-specific ELISA and the micro bicinchoma.c 

acid protein assay (Pierce Chemical Co.). 

Determination of Association Rate Corwtoni.-The purified recom- 
binant PAI-l variants were activated by treatment with 4.0 1 M gua- 
nidine HC1 at 37 'C (18) for 1 h followed by desalting UJ nag NAP-10 
gel filtration columns. The concentration* of the activated PAI-l 
species were determined by quantitative neutralization of t-PA or 
OK in a direct amidolytic assay using S-2288 (17). ™ ° 
inhibition were monitored as followa: plasminogen activatois (1 nM 
were added to pbosphate-buffered saline containing activated PAI-l 
(2-10 nil) and a fiuorogenic substrate (eifter 

amido-4-methylcoumarin or Wr-Cly-Aii^-i«^lvlewm^l-r 
amide). The Michaelis-Menten constants of t-PA M« s »% n *- 
Gly.Arg-7-amido4-methylcoun.arin and UK for 
methyteoumaryl-7-amide were 0.63 and &™ W***^^ 
fluorescence was monitored with a to^™}^^^ 



Notl to yield a l^vr^f^J^^^-^ fa ^p^vely. The time-dependent 

1 _ -m^^^^At^A ftftfl family 



Noil to yield a 1.3-KD pTOmoKr/muw ' \ nac w„j 

This cassette was then inserted into lambda Zap II and pacKflgea 
Lin, Sick Gold to yield a total of 1.2 x 10 s plaque-forming 
3 ThSxoressTorT Ifcrary contained 40% recombinants as deter- 

cDtM Sequences. Noneof the predicted cleavages eliminated a umque 

Irown for 10 h at 37 'C The resultant plaques were monitored for 
>l J~«~ nf PAI 1 activity by a modification of reverse fibrui 
2LKS?(1» a^iUrThe pUqucs were placed in con*c« : wtt 
SSiv lyshig indicator gels containing human HbriMfsn (2.1 
mg/mirnuman Glu-plssminogen (6 «?/ml). human^ h, «taa W 
mru/mil human UK (12 PM), and agarose (1%). The agar plates 
w S aftw • 1 * h « 37 -C and the indicator gels were further 
far 6 h at 25 *C Positive X clones were identified, eluted, 
Z lato and to a second round of functional screen** 

DouWe Sanded Plusmids were excised from isolated bacteriophage 
u^rthe Suatagene protocol and the nucleotide sequences of the 
3 .noetic regions of the PAI-l eDNAs were determined. 
A™ rS? Variant PAI-l Species-Cultures of 8. ft*™*^ 

cDNAa were grown to saturation in 5 ml of LB = onUw ^ am P lC ' Ul " 
(50 ua/m\) Cells were harvested, resuspended in 100 M of 2 m 
' HO, 0-1 « sodium citrate.. pH 7 2 son.ca«d ^ 30 s on « 
and centrifuged 5000 X g for 10 mm. P A 1;1 "^'eve torn tte 
soluble fractions were quantified by ™'}*^f^f%ZJ£ 
calibrated with recombinant yeast-denved native PAI-l <«)J Soluble 
fracJons were adjusted to 4.0 M guanidine HO and 475 nM PAI-l 
anSen FXwi«g incubation at 37 «C for 1 h the WtoS were dlluwd 
WOWd and further incubated with 100 pM UK or tfAtaW»» 
at 37 -C in a volume of 100 ut Subsequentlj r 0.4 nm Spect^m 6 
80 w/ml human Glu-plaaminogen (and 100 M g/»» Desafib for t-PA 
assay rwas added to yield a 200-mI final volume. A, k . values we« 
continuously monitored for 1 h at 25 "C us ^^SgJ 
reader (Molecular Devices. Palo Alto. CA). Res dual plasmmoBen 
«tiv"toi activity was determined by companng velocit.es (AWt.me) 
to UK or t-PA standard curves. c,,\tu,*i 
Purification of Stittud Recombinant PAJ-J v " 10l >*-~™ 
(2O0 ml) of £. toU strain XL-1 Blue transformed w.th plasm ds 
coaming tht p omoter/PAI-1 mutant cDNAs were grown to satu- 



460 ^respectively. The time-dependent ix.hib.tion of pta™*? 
activators at varying concentrations of PAl-1 wasrecorded as a famUy 
of Progress curves (19). The data from each curve were fit to the 
inteLted first-order rate equation: F = u.t + In. - - • 
SSSS ^regression, which allowed for the calculation of the appar- 
ent 2k where * = *, + [/»./(! T/Wl A ^.C ™« 
function of PAI-l concentration (/) yielded a line whose slope was 
J5S to h,/(l + ./KJ. from which the second-order rate constant, 
*, or fe.~,, was calculated. 



RESULTS . 

A set of variant PAI-l cDNAs encoding all possible amino 
acid combinations in the P„ P,. and P. positions was con- 
structed by cassette mutagenesis wing synthetic degenerate 
oligonucleotides. The modified PAI-l cDNAs were > joutfd to 
a tryptophan promoter/operator region and inserted .nto bac- 
toriophage X to generate a PAI-l variant expression library 
(Fig 1). PAI-l variants were screened for inhiblory activity 
against UK using a spontaneously lysing fibrin indicator gel 



Glu 



P3 Pi PI PV 
Vsi lie vel X X X Mai 

GTC A.TA GTC NN^ NNj ATG GCT CCCGAG 



Aval 
Al9 Pro 




Fig 1. Schematic representation of the bacteriop^ePAI- 
1 var anteXpressionlibrary.Tliesetof degenerate PAM cDNAs 
ll ^Sri^joinmg the 5 ' end of the ™-\?™*fi£XS. 
with synthetic ongonucleotides {'tippled region) designed torn th. ^3 
„j of Via pAI- 1 cDN A. The nucleotide and predicted amino acid 
Z£2**i S£* of the reactive site of PAM «re shown 
Potions designated by N represent equal mixtures 
C used during oUronudeotide synthesis. The amino scidj occupying 
fhe P,. P, ^P^sitione are designated by X 
placements in this region The tryptophan V™™/?™™^ 
deeisnatcd TRP and depicted as the crow/wtched reflion. 1 he X arm» 
are represen^d by the Pad** lines. Relevant restrict™ e ndonucle- 
aae sites arc indicated. 
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containing plasminogen and UK. Bacteriophage expressing 
an active PAM species were identified by opaque zones of 
r€tafded"fibrinolysis m-the-indicator gel-At-least-SOO-zones™ 
were readily detected from a sampling of 80,000 recombinant 
bacteriophages. 

390 of the bacteriophages encoding an actwe PAI-1 species 
and 42 randomly selected inactive PAI-1 species were isolated, 
the double-stranded plasmida harboring the promoter/PAI-1 
variant cDNAs were excieed, and the nucleotide sequence? of 
each cDNA in the region of thft P 3 , P 2 , and P, residues were 
determined. Nucleotide frequencies were calculated from the 
nine bases encoding the P a -Pi positions of the randomly 
selected inactive PAM cDNA variants. The observed fre- 
quencies of G. A, T, and C equalled 0.28, 0.17, 0.32, and 0.23, 
respectively. These values were very similar to the predicted 
frequencies of G, A, T, and C that equalled 0.30, 0.20, 0-30, 
and 0.20, respectively, 

The PAI-1 variant cDNAs that were analyzed encoded 135 
unique active and 42 unique inactive PAI-1 species (Table I). 
121 of the 135 active PAM species retained an Arg in the P, 
position, while the remainder contained a Lys substitution in 
this position. 14 of 19 and 16 of 19 possible PAI-1 variants 
containing sole amino acid substitutions in either the P : , or 
Pi positions, respectively, were active towards UK. 90 of the 
active PAI-1 variants contained simultaneous substitutions 
in the P a and P a positions; this value corresponds to 25% of 
the theoretical maximum. Only seven PAM variants contain- 
ing simultaneous substitutions in the Pa, P*, and Pi positions 
were identified as being active towards UK; each of these 
contained Lys in the Pi position. Primary structures of the 
42 randomly picked inactive PAl-1 variants possessed an 
apparent random distribution of amino acids in the Pa, Pa, 
and Pi positions. 

The 135 active and 42 randomly picked inactive PAI-1 
species were individually synthesized in E. coli using the 
rescued PAM expression plasmids. Immunoblot analysis of 
bacterial lysates fractionated by sodium dodecyl sulfate-poly- 
acrylaraide gel electrophoresis (20) (Fig. 2) demonstrated that 
the recombinant PAM synthesized in E. coli (/one 1 ) comi- 
grated with PAI-1 internally expressed in yeast [lane 3) (17). 
The electrophoretic mobilities of each of 12 different active 
and inactive PAI-1 variants were virtually identical to E. coli- 
derived native inhibitor (data not shown). The immunoreac- 
tive species was not present in E. coli lysates harboring the 
parent expression vector but lacking the PAI-1 cDNA (bn* 
2). PAI-1 purified from HT1080 fibrosarcoma cells (lane 4) 
exhibited a slightly larger apparent M T value due to the 
presence of carbohydrate, a characteristic not shared by £. 
coU and yeast-derived recombinant PAI-1. 

Guanidine-treated lysates containing equal amounts of 
PAI-1, as judged by PAM-spccific EUSA, were incubated 
with UK for 15 min at 37 *G and the residual plasminogen 
activator activities were assayed. The PAI-1 variants that 
were sorted with respect to position (s) of the amino acid 
substitution (s) (Table 1) are ranked within these groups ac- 
cording to their abilities to inhibit UK. None of the 42 
randomly picked inactive variants exhibited measurable ac- 
tivity toward UK. Only 35 of the 134 active PAI-1 variants 
exhibited greater than 60% of the inhibitory activity against 
UK relative to that of native PAI-1. 20 of these 35 contained 
sole substitutions in the P 3 or P 2 position; 14 of the 35 
contained simultaneous substitutions in the Pa and P 2 posi- 
tions; and only one of the 35 contained concurrent replace- 
ments at all three positions. Ala, Gly, Thr, or Ser was present 
at the P 3 and P 2 positions in 25 and 22 instances, respectively, 
of the 35 PAI-1 variants exhibiting greater than 60% relative 



inhibitory activity toward UK. Ser or Gly was present at the 
P 2 position in the seven most active PAI-I variants containing 

sub3titutions-in-both-the-P t vand-Pi-po8itions. 

The 135 PAI-1 species that were active towards UK were 
also monitored for their abilities to inhibit t-PA. The data, 
shown in Table I, clearly indicate that many of the PAI-1 
variants are differentially active against UK and t-PA. Fur- 
thermore, 84 PAM variants retained greater than 60% of the 
inhibitory activity of native PAI-1 toward t-PA; this number 
is 2.4-fold higher than that observed for UK. A similar func- 
tional screen of the PAM variant expression library using t- 
PA instead of UK yielded approximately four times more 
zones of retarded fibrinolysis (data not shown). 

Recombinant native PAI-1 and selected PAI-1 variants 
were partially purified from bacterial cell lysates using 2n- 
chelate-Sepharose chromatography. The PAM species were 
activated with guanidine and their functional molarities were 
determined by titration against known amounts of UK or t- 
PA. The apparent JW values of these PAM species, HT1080 
PAM and yeast-derived recombinant PAI-1, were determined 
with both UK and t-PA using a continuous fluorometric assay 
(Table II). The apparent constants exhibited by the E. 
co/i-derived native PAM for t-PA and UK are similar to one 
another. Likewise, they are similar to those determined with 
yeast-derived native PAM but slightly less than those ob- 
tained with HT10&0 PAM. This difference between yeast- 
derived PAI-1 and HT1080 PAM was previously noted (17) 
and is probably due to the absence of carbohydrate on the 
inhibitor expressed in yeast. 

The apparent constants obtained with the selected 
PAI-1 variants (Table II) were consistent with the relative 
inhibitory activities shown in Table I and confirmed that 
certain amino acid substitutions in the P 3 and P 2 positions of 
PAM selectively crippled inhibitory activity against UK or 
t-PA. For example, replacement of Ala 34C in PAI-1 with Arg 
or Lys resulted in larger decreases in the apparent fcu*«. 
constants with t*P A than with UK. On the other hand, several 
of the chosen PAI-1 variants (Tyr :M< ; VaP 44 , Pro :,4a ; Val* 44 , 
Ser*" 15 ; Tyr* 44 , Ser 145 ; Tyr 344 , Gly 345 ) exhibited apparent /i Mwe 
values that were significantly diminished for UK but essen- 
tially unchanged for t-PA. Two of these variants (Tyr» 4 \ Ser MS 
and Tyr 144 , Gly™) displayed an apparent fe MiUUK constant that 
was somewhat higher than that of the E. cotf -derived native 
inhibitor. The most impressive example of altered specificity 
is the Vat* 44 , Pro 34 * double-substitution variant Its apparent 
value with t*PA was similar to that of native PAI-1, but 
the apparent /w value with UK was decreased by approxi- 
mately 23-fold. 

DISCUSSION 

Amino acid substitutions in the P 5 , Pai and Pi positions of 
PAM can have a profound effect on the ability of the inhibitor 
to inactivate UK and t-PA. A striking paradigm revealed by 
this study is that the activity of PAM toward UK is contin- 
gent on the presence of Arg or, less frequently, Lys in the Pi 
position. Concurrent changes in the neighboring P$ and P 2 
residues do not circumvent the strict necessity for a basic 
amino acid residue in the P x position. This obligatory require- 
ment for Arg or Lys is concordant with the presence of an 
Arg in the P x position of plasminogen (21), the preferred 
substrate of UK and t-PA, 

Amino acid substitutions in the Pa and Pj positions of PAI- 
1 were generally well tolerated by UK, although not all re- 
placements were allowed 14 of 19 possible P B variants, 16 of 
19 possible P 3 variants, and 90 of 361 possible P 3 and P* 
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Table I 

Inhibitory activity of the unique PAI-1 *P*cics toward UK (column U) and t-PA (column TJ - 
£S 5 ^4"^^ I- than »«. The inactive PAM 



average v* * — — - r 

variants are grouped separately. 
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INACTIVE VARIANTS 
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99 

52 

106 

75 

40 

72 

100 

109 

96 
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91 
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91 

46 

64 
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3D 
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62 

84 

24 
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85 

17 

53 

61 

22 

24 

109 
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66 
9 

61 
84 
45 
91 
10 
9 

37 
29 
46 
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KHI 
HSp 
QTD 

txsv 

DTH 
DRP 
KO? 
WW 
RP? 
HKG 
RCI? 
BET 
H.Q 



double variants were identified as active inhibitors of UK by 

containing the reactive A residues, 
Arg-Met. in the original P r P f or P r P t sites W erc ; not idenU- 
fied by the functional screen. Hence, there appears to be a 
i^^Blttonal requirement for the P, and P,' revues along 
the putative reactive site loop (5). In contrast, the reactive 
site loop structure of another serpin inhibitor, ot.-antiplasnnn, 
contains separate, although overlapping reactive sites for 
trypsin and chymotrypsin (22). 



The amino acids in the P a and P 2 posifeons of the PAW 
variants that are preferred by UK were better m»trf 
when the PAI-1 variants were ranked ^threspect to the r 
relative inhibitory activities. The ^ r ro ^ r fj£?lji 
1 variants necessitated that a simple "fixed hn»^f^ 
be used to determine inhibitory activity. A possible pitfall of 
this assay, apart from its relative insensitmty to the inhibi- 
tion kinetics, stems from the existence^ ^ ^ 
states of PAM (IS). Each variant or native ^1-1 preparation 
was activated with guanidine pnor to assay of inhibitory 
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30.0 - 

21.5- 

Fic 2. Heterologous expression of human PAM in E. coli. 
Lvsates from £ cotf cultures harboring the expression vector for 
native PAI-1 (lane J) or the parent vector lacking the PAI-1 cDNA 
(tone 2) were fractionated by sodium dodecyl sulfate-polyacrylamide 
Hal electrophoresis. Purified recombinant PAI-1 synthesized in yeast 
(120 ng, tone 3) and purified PAl-1 secreted from HT1080 fibrosar- 
coma COlU (115 ng, lane 4) wart also electrophoreses PAI-1 wub 
detected by immunoblot analyais using an anti-PAl-X murine mono- 
clonal antibody and alkaline phosphatsse-conjugated goat anti-mouse 
I^G. Protein relative molecular mass marker* shown in kiloDaitons 
are us indicated. 

Table II 

Association rate constants far the interaction* between PAl-1 specie* 
and t-PA or UK 

PAM species: nPAI-l f native PAM secreted from RT108O fibro- 
sarcoma cella; rPAI-1, recombinant yeast-derived native PAI-1; the 
>; cotf-derived PAl-1 species are designated by one-letter: amino acid 
code triplets corresponding to the identity of the amino acids in the 
Ps, P„ and P, positions. Association rate constants with UK or t-PA 
were determined aa^sciibedunder "Experimental Procedures." 



PAi-i 

specie* 


t-PA 




Af 


nPAl-1 


(9.2 ± 3.1) x 10" 


rPAI-1 


(6.2 ± 0.5) x 10 r ' 


SAR 


(7.5 ± 0.8) x 10 fi 


SRR 


(5.4 ± 0.3) X 10 f ' 


SKR 


(7.6 ± 0.1) X I0 fl 


VPR 


(7.0 ± 3-1) x!0* 


VSR 


(7.0 * 3.1.) x 10* 


\AR 


(6.3 ± 1.7) X 10° 


YSR 


(12.1 ± 1.5) X 10* 


YGR 


(10.7 ± 2.0) X10 C 



UK 



t-PA/UK 



(11.7 ± 0.8) 
(6.4 ± 0.7) 
(7.4 ± 0.4) 
(2.0 ±0.2) 
(1.3 ± 0.3) 
(3.1 ±0.3) 
(1.7 + 0.1) 
(1.9 ± 0.1) 
(1.3 ± 0.1) 
(1.6 ±0.1) 



x 10 s 
x 10* 
X 10* 
X 10 6 
x 10* 
xlO* 
x 10" 
x 10" 
X 10" 
X 10 6 



0.8 
1.0 
1.0 
0-3 
0.6 
23 
4 

3.3 
9.3 
6.7 



activity. Hence, the potency differences shown in Table I 
could reflect large variations in the relative amounts of active 
and latent forms of the inhibitor among the activated PAI-1 
preparations. However, the apparent rate constants of se- 
lected PAM variants (Table 11), which were based on careful 
titration of active inhibitor, mirror the relative inhibitory 
activities displayed in Table I. We conclude that disparate 
amounts of active and latent forms of PAI-1 was not primarily 
responsible for the broad spectrum of inhibitory activities 
exhibited by this a*t of PAl-1 variants. 

Certain inferences can be drawn from our data with respect 
to the topology of the active site region in UK. As stated 
previously, the S, subsite of UK (as well as t-PA) displayed a 
fastidious requirement for the presence of a basic amino acid 
residue in the Pi position Of the inhibitor. Furthermore, the 
36 most-active PAI-1 variants against UK contained Ala, Gly, 
Tht, or Ser at the P s and P 2 positions in 25 and 22 instances, 
respectively. These four amino acids are contiguously and 
centrally located on an empirical hydrophobicity scale derived 
from the average area that each amino acid buries upon 
folding in globular proteins of known structure (23). Hence, 
the S 3 and S* subsites of UK apparently prefer amino acids 



that fall within a narrow range of intermediate hydrophobic- 
ity. 

— Gomparison-of-the-inhibitory-activities-of-the~PAI-l-var — 
iants against UK and t-PA reveals that t-PA is more tolerant 
than UK of amino acid substitutions in the reactive site region 
of PAM. Of the 134 active PAI-1 variants, 35 and 84 exhibited 
greater than 60% of the inhibitory activity of native PAl-1 
toward UK and t-PA, respectively. The apparent topological 
dissimilarities between t-PA and UK in their active site 
regions are convincingly shown by the marked differential 
inhibition of these two plasminogen activators by the PAI-1 
variants. Without exception, the single- substitution P* var- 
iants that were less active than native PAM toward UK did 
not exhibit diminished activities against t-PA. In contrast to 
UK, t-PA was similarly tolerant of variation in the P* position 
of PAI-1 variants also containing substitutions such as Ser, 
Gly, or Thr in the P* position. We postulate that an S n subsite 
may not exist in t-PA for the binding of PAI-1 or, if it does 
exist, would readily accommodate a larger variety of amino 
acid side chains than the S.i subsite of UK. The ligand 
preference of the Sa subsite of t-PA was relatively broad 
although, unlike UK, there appears to be a bias against basic 
amino acids. 

Assuming that all possible unique PAI-1 cDNAs were 
equally represented in the expression library, then statistical 
analysis 3 predicts that greater than 91% of the 32,768 possible 
PAI-1 cDNAs were present in the sampling of 80,000 recom- 
binant bacteriophages screened with the fibrin indicator gel. 
Nevertheless, our evaluation of the amino acids tolerated at 
the reactive site region in PAM is not as complete as pre- 
dicted from the above analysis for a variety of reasons. First, 
the large number of positives encountered during the screen 
precluded analysis of each one. Second, many plaques con- 
taining a less-active PAI-1 variant could be missed due to the 
transitory nature of weaker signals upon continuing develop- 
ment of the indicator gel. Third, bias could be imposed upon 
the library due to differences in the growth rates or plaque 
sizes of certain recombinant bacteriophage clones. In spite of 
these reservations, the presently described screening strategy 
provided a broad and insightful survey of the consequences of 
a vast number of amino acid substitutions in the vicinity of 
the reactive site of PAM. 

In conclusion, our results have demonstrated the impor- 
tance of P a , P 2 , and Pj residues in PAM to inhibitory efficacy 
towards UK and t-PA. Furthermore, this study highlights 
cUssimilar specificity restraints imposed by the active site 
regions of these two plasrninogen activators. The resourceful- 
ness of the combinatorial mutagenesis approach used herein 
is clearly evident by the fortuitous, simultaneous amino acid 
replacements in the Pa and P2 positions that gave rise to PAI- 
1 variants (such as Val 344 , Pro 3 * 6 ) exhibiting a marked pref- 
erence for t-PA over UK. The PAM variants that preferen- 
tially inhibited t-PA are of special interest in light of the 
ability of PAM to rapidly reverse the bleeding tendency in 
rabbits following the combined administration of t-PA and 
aspirin (24). Accordingly, PAI-1 variants that are relatively 
inert towards UK may be safer antidotes for the treatment of 
t-PA toxicity. The functional screen described herein will 
continue to be a valuable tool in probing additional regions 
in PAI-1 and, after suitable adaptations, this method could 
provide insight into analogous interactions between other 
inhibitors and their target proteases. 

3 A 9596 confidence upper bound on the number of unique PAl-1 
cDNAs represented in the population was obtained using the Che- 
byshev inequality (26), 
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ABSTRACT A method for simultaneously engineering 
multiple properties of a protein, based on the observed addi- 
tivity of effects of Individual mutations, is presented, We show 
that, for the gene V protein of bacteriophage fl, effects of 
double mutations on both protein stability and DNA bludtng 
affinity are approximately equal to the sums of the effects of the 
constituent single mutations. This additivity of effects implies 
that it is possible to deliberately construct mutant proteins 
optimized for multiple properties by combination of appropri- 
ate single mutations chosen from a characterized library. 

One of the long-term goals of the study of mutational effects 
on protein stability and activity is to devise a method by 
which mutations can be rationally employed to alter or 
''engineer" the properties of proteins with predictable re- 
sults. Recombinant DNA technology has allowed the con- 
struction of proteins of altered stability in vitro (1-16), 
catalytic efficiency (17-19), substrate specificity (20-23), and 
resistance to in vivo thermal inactivation (24) through the use 
of single or multiple amino acid substitutions. This effort has 
been greatly helped by the fact that the effects of amino acid 
substitutions on such properties of proteins tend to be 
additive as mutations accumulate, provided that the substi- 
tuting residues do not interact functionally or by direct 
contact (25). For example, additive increases in the stability 
of subtilisin BPN' have been achieved by combining muta- 
tions at six sites in the protein tertiary structure (16). The six 
mutations individually stabilize the protein by 0.3-L3 kcal/ 
mol, and the individual effects sum to a stability increase of 
3.8 kcal/mol predicted for the hexa-mutant. The observed 
stabilization of the mutant containing all six substitutions is 
4.3 kcal/mol (16). Additive effects of amino acid substitutions 
have been used to engineer incremental increases in the 
stability of other proteins including the N-terminal domain of 
A repressor (5, 13), T4 tysozyme (9, 12), kaiiamycto nucle- 
otidyltransferase (6), and neutral protease (26) as well. This 
strategy of additive mutation has also been employed to alter 
binding affinities or specificities of proteins, such as A re- 
pressor (20), subtilisin (21-23), and glutathione reductase 
(27), for their substrates or cofactors, and to alter the pH 
profile of subtilisio (17). . 

A factor complicating the effort to engineer proteins by 
mutation is that most single amino acid substitutions alter 
multiple properties of the proteins in which they are made. To 
be functional, a protein must be at once stable, yet flexible, 
with high catalytic activity balanced against substrate spec- 
ificity. Because mutants affecting only one of these proper- 
ties are relatively rare, it appears difficult to optimize one 
. characteristic of a protein through mutations while maintain- 
ing adequacy in the others. However, the observation that 
mutational effects on the in viiro properties of proteins are 

The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked "advertisement" 
in accordance with 18 U.S.C. 91734 solely to indicate this fact. 



frequently additive suggests that it may be possible to coun- 
teract deleterious side effects of desirable (or primary) mu- 
tations by including additional mitigating (or secondary) 
mutations. If the effects of combining multiple mutations 
display simple additivity, then the net effect of the primary 
and secondary mutations on a given in vitro property of the 
subject protein should be the algebraic sum of the effects 
observed in the starting mutants . The objective of the present 
study is to test this assumption and to explore the possibility 
of creating proteins that have been optimized with respect to 
multiple in vitro properties. 

The gene V protein of bacteriophage fl is a small single- 
stranded DNA (ssDNA) binding protein that lends itself to 
this goal because its DNA binding affinity and stability can be 
readily measured in vitro and mutants of the gene V protein 
are readily available. A plasmid-bascd mutagenesis and ex- 
pression system allows the rapid production of proteins 
containing single and multiple substitutions (28, 29). Condi- 
tions for monitoring gene V protein stability in vitro have 
been established (30), allowing quantitative assessment of the 
effects of single and multiple substitutions on stability (3,4, 
31). The stability of wild-type (WT) and mutant gene V 
proteins can be estimated as a function of their resistance to 
guanidine hydrochloride (Gdn-HCl)-induccd denaturation, 
monitored by the disappearance of tyrosine circular dichro- 
ism (CD) at 229 nm as the protein unfolds (30). Cooperative 
binding of the protein to its substrate, ssDNA, can be 
followed in vitro by monitoring the intrinsic tyrosine fluo- 
rescence of the protein (32), This fluorescence is quenched as 
the protein binds to ssDN A and is restored when the protcin- 
ssDNA complex is dissociated by the addition of NaCl. 
Binding affinities of WT gene V protein to a variety of 
substrates and of WT and several mutant gene V proteins to 
the substrate potydeoxyadenylic acid have been reported (31, 
33-36), Thus the gene V protein provides a system in which 
proteins containing amino acid substitutions may be readily 
obtained and characterized in vitro. We used this system, 
starting with well-characterized single-substitution mutants, 
to construct doubly substituted proteins displaying predict- 
able and additive changes to both DNA binding affinity and 
stability. 

MATERIALS AND METHODS 
Mutagenesis, Strains, and Vectors. Mutagenesis of gene V 
was carried out in the plasmid pTT18 as described (28, 29). 
Single mutants were constructed by oligonucleotidc-directed 

Abbreviations: Gdn-HCl, guanidine hydrochloride; ssDNA, single- 
stranded DNA; WT, wild type. Substitutions arc described in tne 
one-letter code; e.g., Y41F denotes the replacement of tyrosine at 
position 41 by phenylalanine. 

TPrcscnt address: Pruzker School of Medicine, University of Chi- 
cago, Chicago, tL 60637. 
(To whom reprint request* should be addressed. 
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V protein 


Muta- 


Minimum 






tions) 


separation, A 


kcal/mol 


Jccal/mol 


io ▼ 




-0.68 


0.21 


F13T 




-0.67 


-0.65 






1.09 


-0.06 


E30F 




1*97 


2.29 


L32Y 




1.04 


0.37 


C33M 




-3.49 


-0.45 


C33V 




-0-18 


-0.09 


V35A 




-2.25 


-0.45 


V35C 




-1.45 


-0.62 


V35F 




-3.21 


-1.00 


V351 




-0.68 


-0.24 


V35L 




-2.72 


-0.47 


V35M 




-1.10 


-0.99 


Y41F 




-0.62 


-3.12 


V45C 




^0.05 


0,15 


I47C 




-5.30 


-0.29 


M7F 




-2.02 


0.85 


I47L 




-0.67 


0.03 


I47M 




-2.21 


0.61 


I47V 




-2.62 


-0.25 


H64C 




0.50 


-0.12 


L65P 




-1.47 


-0.73 


F68L 




-4.30 


-0.26 


F73W 




0.76 


1.21 


M77I 




1.63 


O.10 


M77V 




1.23 


0.06 


R82C 




-1.50 


-2.74 


A86T 




-0.66 


- -0,26 


A86V 




0.47 


-0.25 


I6V/E30F 


10.7 


0.71 


2.27 


I6V/M771 


17.3 


0.34 


0.17 


I6V/M77V 


17.3 


-0.04 


0.27 


F13T/E30F 


5.8 


1.59 


1.03 


L28V/F68L 


19.0 


-3.46 


-0.84 


E30F/A86T 


21.6 


1.17 


1.93 


E30F/A86V 


21.6 


2.05 


1.94 


L32Y/R82C 


15.6 


0.15 


-1.78 


C33V/V35C 


4.0 


-1.57 


-0.57 


C33M/I47C 


3.9 


-5.73 


0.29. 


V35A/147F 


6-4 


-3.67 


0.14 


V35A/I47L 


6.4 


-2.97 


-0.50 


V35A/I47M 


6.4 


-4.51 


0.10 


V35A/I47V 


6.4 


-4.52 


-0.89 


V35C/I47C 


6.4 


-7.20 


-0.79 


V35F/I47L 


6.4 


-4.22 


-1.28 


V351/I47F 


6.4 


-2.08 


0.30 


V35I/147L 


6-4 


-1.18 


-0.29 


V35I/I47M 


6.4 


-2.85 


0.46 


V35I/I47V 


6.4 


-3.12 


-0.51 


V35L/I47F 


6.4 


-4.00 


0.24 


V35L/147L 


6.4 


-3.58 


-0.51 


V35L/I47M 


6.4 


-5.41 


-0.08 


V35L/147V 


6.4 


-5,10 


-1.03 


V35M/I47F 


6.4 


-2.35 


0.07 


V35M/I47L 


6.4 


-1.70 


-1.08 


V35M/I47M 


6-4 


-3.62 


-0.48 


Y41F/F73W 


21.0 


-0.66 


-0.73 


V45C/R82C 


11.7 


-1.05 


-2.44 


H64C/F68L 


7.0 


-4.07 


-0.80 


L65P/F68L 


4.5 


-4.25 


-0.83 



For double mutants, the closest pair of atoms (minimum separa- 
tion) in the two side chains in the crystal structure of the WT gene 
V protein dimer (M. M. Skinner, H. Zhang, D. H. Leschnitzer, Y. 
Guan, H. Bellamy, R. M. Sweet, C. W. Gray, R. N. H. Konings, A. 
H.-J. Wang, and T.C.T, unpublished data) is listed. In two cases, 



mutagenesis and isolated as derivatives of pTT18. Double 
mutants were obtained by ougonucleotide-directed mutagen- 
esis, recombination of single mutants by the use of interven- 
ing restriction sites, or selection (as intragenic suppressors of 
a conditional lethal mutation) from a pool of random single- 
amino acid substitution mutants (31). Mutant genes were 
expressed inBscherichiacoliK561($7).-Tt^ 



coli K561 with pTT18 derivatives was effected using an 
clcctroporauon device. 

Protein Purification* Growth of K561 cultures transformed 
withpTT18 derivatives encoding gene V protein variants and 
purification of proteins were carried out as described (30, 31), 
To confirm that the mutant proteins contained the expected 
amino acid substitutions, ssDNA was isolated from E. coli 
harvested late in the growth and the gene V region was 
sequenced. 

Measurement of ssDNA Binding Affinity. NaCMnduced dis- 
sociation of gene Vprotein-ssDNA complexes, monitored by 
fluorescence, was used to estimate the binding affinities of WT 
and mutant gene V proteins for the substrate polydeoxyade- 
nylic acid as described (refs. 31-^36 and unpublished observa- 
tions). Data are reported as the apparent free energy change 
upon dissociation in 0.15 M NaCl (AGS.o.ism), related to the 
effective binding constant in 0,15 M Nad (Kvo.isu) by 
AGrf,o.tfM = +RT\ti(K6><i.isM) (where R « 1.987 cal/mol-K; T 
- 298 K). Differences in binding between mutants arc ex- 
pressed as differences in free energy change upon dissociation 
(MGJ.CUSM), defined as [AGJ.o.ifM (mutant) - AGJ.o.ism 
(WT)]. Mutants binding more tightly to ssDNA than the WT 
wul have positive values of AACJ, 0 .i5M- Err° r estimates (2 SD) 
were obtained from seven measurements of the DNA binding 
affinity of the WT gene V protein leading to an error of ±0.1 
kcal/mol for MGS.o.hm • 

Measurements of Protein Stability. Stability measurements 
on mutant gene Vproteins were carried out as described (30). 
The gene V protein is reversibly denatured by GdrrHCl, and 
the denaturation can be monitored by the disappearance of a 
tyrosine CD signal at 229 nm. Unfolding data were fitted to 
a two-state model (30) with modifications (4) in the case of 
proteins for which the unfolding is >SQ% complete when 
[Gdn*HCI] < 1.5 M. Stabilities are expressed as free energy 
changes upon unfolding in kcal/mol of dimeric protein. The 
stability (AG^m) of the WT gene V protein, given as the 
average ± 2 SD of 10 measurements, is 9.04 ± 0.3 kcal/mol 
of dimeric protein. Stabilities of mutant (A>GS.2m) are com- 
pared to that of the WT in the presence of 2,0 M Gdn-HCl to 
yield the difference (AAG2 t2M ); The estimated error in values 
of stability changes of mutants, relative to that of the WT 
protein {AbG° um ) is ±0.4 kcal/mol. Stabilities of mutants at 
positions 35, 47, 28, 64, 65, and 68 have been reported (4, 31, 
38) and are taken from those works. 

RESULTS AND DISCUSSION 

Mutants of the Gene V Protein. The data compiled in Table 
1 include stability and DNA binding affinity measurements 
for a variety of single mutants of the gene V protein and for 
a series of double mutants constructed by combination of 



Ilc-6/Met-77 and Tyr-41/Phe-73. the two side chains are within 
separate monomers, and in all other cases, they arc within the same 
monomer of the protein. Changes in stability, measured &$ the 
change in free energy upon unfolding (AGS jm), arc given in kcal/mol 
of dimeric protein, relative to WT. Mutants with increased stabilities 
have positive values of AAGS.tm &nd the magnitude corresponds to 
making the same substitution twice. Change* in apparent free 
energies of dissociation from polydeoxyadenylic acid (AAG^.ism). 
relative to WT gene V protein, are given in kcal/mol ; positive values 
of AA62,o.um indicate enhanced binding of the mutant to ssDNA 
relative to WT. 
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these single mutations. These data were used to explore the 
utility of concomitantly engineering in vitro stability and 
-DNA binding affinity of the geneV protein through-the use- 
of additive mutational effects. Logically, this analysis con- 
sists of two steps: (/) assessment of the additivity of muta- 
tional effects on DNA binding and stability for the mutants 
listed in Table 1 and (if) comparisons of the combined 
properties of a group of double mutants with those found in 
the starting group of single mutants. 

Additivity of Mutational Effects on the Gene V Protein. It 
has been observed that the effects of accumulating mutations 
on in vitro properties of proteins are very nearly additive, 
provided that the substituted residues do not interact, cither 
functionally (as in the case of catalytic residues) or by direct 
contact (25). When the substituted residues in multiply sub- 
stituted proteins do interact or disrupt an interaction present 
in the WT protein, the effects are not additive. This is a 
potential stumbling block to protein engineering by accumu- 
lating mutations because the in vitro properties of such 
multiple mutants could not be predicted from the effects of 
the single mutants taken alone. Fortunately, interaction 
between residues in proteins appears to be relatively rare, 
with the exception of catalytic residues (25). We assessed the 
additivity of stability changes among single and double mu- 
tants of the gene V protein by considering the stabilities of 
double mutants, along with the stabilities of their constituent 
single mutants, and comparing these to the stability of the 
WT. Fig, 1 compares the stability changes, (AAGJ.zm) rela- 
tive to WT, of the double mutants listed in Table 1 with the 
sums of the stability changes of their constituent single 
mutants. The data generally fall along a straight line with a 
slope near unity, demonstrating that interactions between 
substituting residues appear to be minimal in the combina- 
tions tested. This simple additivity was unexpected for pair- 
wise substitutions of residues at positions 35 and 47 of the 
gene V protein due to the close proximity of these sites in a 
published crystalJographic model of the protein (39). How- 
ever, NMR results (40) and a revised crystal structure 
determined in our laboratory (Skinner et a/., unpublished 
data) show that, instead, Ile-47 is near Cys-33- This in turn 
may explain the significantly nonadditive stability effect of 
combining the mutation C33M with 147C (Table 1 and Fig. 1). 
This double mutant lies far from the line described by the rest 
of the combinations shown in Fig. 1. 




Sum of stability changes 
of single mutants (kcal/mol) 

Fig. 1. Additivity of mutational effects on gene V protein sta- 
bility. The stability change (AAGS^m), relative to the WT protein, of 
gcoe v protein double mutants is shown on tbc y axis. The x axis 
shows the sum of the stability changes, also relative to the WT 
protein, of the constituent single mutants. Positive values of AAG£,zM 
indicate proteins with increased stability. The combination of the 
mutants C33M and I47C is indicated by the diamond (♦). A line with 
unit slope is shown for reference. 



Analogous to the results for stability changes, the DNA 
binding affinity changes (AAGJ.o.jsm). relative to WT, of the 
double mutants tested are generally the sums of the binding 
affinity changes of their constituent single mutants, as shown 
in Fig. 2. 

Presumably, some combinations of single mutations in 
addition to the C33M/I47C double mutant will lead to inter- 
actions between substituting residues. However, the fre- 
quent observation of simple additivity for the pairs of sites 
studied in gene V protein (Figs. 1 and 2), and the stepwise 
accumulation of stability changes observed in other proteins, 
including variants containing up to six substitutions (16, 25), 
suggests that interactions between substituents may be rel- 
atively rare as long as the sites chosen as targets for substi- 
tution are not obviously related by proximity or function (as 
in the case of catalytic residues). Thus, the additivity of 
effects of substitutions shown in Figs. 1 and 2 suggests that 
it should be possible to alter both gene V protein DNA 
binding affinity and stability in an additive and predictable 
fashion by combining previously characterized single mu- 
tants. 

Engineering the Gene V Protein. To simultaneously adjust 
gene V protein DNA binding affinity and stability by multiple 
mutagenesis, it is important to know the relationship between 
the binding affinity changes and stability changes of the 
starting single mutants. If the two properties are strongly 
correlated, then the stability change caused by a mutation 
will always be in the same direction, relative to the WT, as 
the DNA binding affinity change, restricting the range avail- 
able in one parameter (binding affinity or stability) relative to 
the other. On the other hand, if DNA binding affinity changes 
are loosely correlated or uncorrected with stability changes 
in the starting group of single mutants, then it should be 
possible to generate mutants whose stability changes range 
widely with respect to DNA binding affinity changes. In this 
case, the DNA binding affinity and stability of the gene V 
protein can be altered simultaneously yet independently of 
each other, simply by combining single mutants to give the 
desired changes in each parameter. 

To assess the correlation between DNA binding affinity 
changes and stability changes in the starting group of single 
mutants, AAGJaism < thc DNA Dindir *S affinity change) is 
plotted against MG5, 2 m (the stability change) in Fig. 3A. 
Positive values of AAGJ.2M or AACS.0.15M indicate higher 
stability or increased DNA binding affinity, respectively. 




.3-2-10 i 2 s 

Sum of binding affinity changes 
of single mutants (kcal/mol) 

Fig. 2. Additivity of mutational effects on gene V protein DNA 
binding affinity. Binding affinity change (MGJ.o.um). relative to the 
WT protein, of gene V protein double mutants is shown on they axis. 
Tbc x axis shows the sum of the binding affinity changes, also relative 
to the WT protein, of the constituent single mutants. A positive value 
of AAGdjkuw indicates enhanced binding to ssDNA relative to WT. 
The combination of the mutants C33M and I47C U indicated by the 
• diamond (♦). A line with nnit slope is shown for reference. 
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Fig. 3. (A) Comparison of ssDNA binding affinity changes 
(AAC?5 o.iiM) wil n stability changes (AAGS.im) for single-substitution 
mutants shown in Table 1. The stability change, relative to the WT, 
is plotted on the x axis whDe DNA binding affinity change is plotted 
on the y axis. The upper right portion of the graph, therefore, 
contains mutants with higher stability and DNA binding affinity than 
the WT. (B) Comparison of ssDNA binding affinity changes with 
stability changes for double-substitution mutants shown in Table 1. 
(C) Calculated stability changes and fisDNA binding affinity changes 
resulting from all possible pairwise combinations of single mutants 
shown in Table 1, with assumptions as described in text, 

than for the WT protein. Stability changes, relative to the WT 
protein, are only weakly correlated with binding affinity 
changes (also relative to WT), indicating that large changes of 
each parameter with respect to the other can be achieved by 
combining these mutants. Fig. M also shows graphically that 
very few of the single mutants alter stability without a 
concomitant DNA binding affinity change, or vice versa. 
Most of the single mutations decrease both stability and DNA 
binding affinity, but some mutations cause increases in one or 
both parameters relative to WT, indicating that adjustments 
of each parameter in either direction are potentially possible. 

The results of combining single substitutions to alter two 
properties of the gene V protein are shown in Fig. 3B, The 
double mutants in Fig. 3B differ, as a group, in their prop- 
erties from the Starting single mutants in Fig. 3A, demon- 



stfatiag that noncorrelated but additive changes in gene V 
protein DNA binding affinity and stability can be used to 
create a group of double mutants with distinctive in vitro 
properties. The double mutants shown in Fig. 3B are only a 
spall fraction of the possible pairwise combinations of the 
single motants sh own in Table 1. Fig. 3C shows the predicted 
result, assuming that DNA binding affinity changes and 
stability changes are always directly additive (meaning no 
interactions between sites) of all possible pairwise combina- 
tions of the single mutants shown in Table 1. Simple pairwise 
combination of a starting group of 29 single-substitution 
mutants at 17 sites leads potentially to pairs of proteins 
differing by as much as 12 kcal/mol in stability or 8 kcal/mol 
in DNA binding affinity without substantial changes in the 
other parameter. Stability increases or DNA binding affinity 
increases relative to the WT of 3 kcal/mol or both arc 
apparently within reach through pairwise combination of 
single mutants as well. Potential examples are the double 
mutants L32Y/L28V (increasing stability), I6V/F73W (in- 
creasing DNA binding affinity), and L32Y/F73W or F73W/ 
M77I (increasing both parameters) (Table 1). Large destabi- 
ligations and reductions in DNA binding could also be 
achieved in theory. In practice, proteins with AAG£,2M more 
negative than *»7.5 kcal/mol are substantially unfolded at 
25*C and are difficult to produce in vivo (unpublished obser- 
vations). Similarly, rapid purification of gene V protein 
variants employs ssDNA affinity chromatography, which 
may impose a lower limit on the obtainable reduction in DNA 
binding affinity. Within the practical limits of expression and 
purification, potentially hundreds of proteins with precisely 
engineered DNA binding affinities and stabilities could be 
produced by pairwise combination of a relative handful of 
single-substitution mutants. 

Potential Utility of Protein Engineering by Multiple Muta- 
tion.- Starting from a modest group of characterized single- 
substitution mutants, we have created double mutants with 
distinctive pairings of in vitro stability and DNA binding 
affinity. The stabilities and DNA binding affinities of these 
double mutants are generally additively related to the stabil- 
ity and DNA binding affinity changes of the starting single 
mutants. These results suggest that large numbers of proteins 
with precisely tailored properties can be deliberately con- 
structed by the appropriate combination of single- 
substitution mutants. The properties of the double- 
substitution proteins can be predicted, based on those of the 
starting mutants, if care is taken to ensure (as much as 
possible) that the sites chosen for substitution will lead to 
simple additive effects (25). Simple additivity may not occur 
if the substituting residues contact each other, due to a 
change in the energy of interaction between the two sites (25). 
However, the potential interactions between amino acid 
residues, with the exception of charge-charge interactions, 
are strongly distance-dependent (41). Also* the effects of 
amino acid substitutions on protein structure are often local- 
ized to the immediate vicinity of the substitution (10, 42-45). 
It has been observed that the efrects of substitutions on the 
properties of proteins are generally additive when the sites of 
substitution are not in van der Waals contact with each other 
(25). Consistent with these suggestions, nonadditivity of 
stability effects in the gene V protein is obsejved for muta- 
tions at sites that are close to each other (sites 33 and 47) but 
not for more distant sites (sites 35 and 47). 

The ability to alter multiple properties of the gene V protein 
by combining substitutions is potentially useful in the further 
characterization of the gene V protein as well. For example, 
some pairwise combinations of substitutions could lead to 
proteins that difiFer in sequence and in some properties, yet 
that possess both WT stability and DNA binding affinity. 
Combinations of L32Y with V35I, I47L, or L65P might lead 
to proteins of this type (Table 1). These proteins could be 
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used to study the effects of substitutions on other properties 
of the gene V protein such as resistance to irreversible 
~th~en^"denatuxati^^ 

in the context of WT stability and DNA binding affinity. 

Protein engineering through the combination of single- 
substitution mutants may be most successful at adjusting 
those properties of concern in vitro, rather than in vivo 
activity. This is because activity in vivo may involve sequen- 
tial interactions or parameters such as lifetime and folding/ 
unfolding rates not considered in the in vino analysis of the 
effects of substitutions. Mutations leading to DNA binding 
affinity and stability changes probably alter these other 
properties as well, complicating the task of engineering a 
particular property as the number of parameters to be main- 
tained near WT values increases. Nevertheless, engineered 
proteins may find many applications in vitro where defects in 
some properties may be acceptable, and the ability to rapidly 
adjust the in vitro properties of proteins by combining well- 
characterized single substitutions should facilitate future 
protein engineering efforts. 
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DNA sequence analyse* Smglc-strandcd Ml 3 DNA and 
daublc-xtiwidcd plasinid DNA were sequenced aecordinff 
to the mediod of Sanger, Nicklen and Coulsw (1977^ 
and HATTOftiandSAkAzr (1986). 

Betermlnarion of ^-goIacto5idA«e ictjviticsi Assays were 
done exactly as described by Miller (1972). except that 
cultures were grown in LB supplemented with the appro- 
priate antibiotics. For induction studies overnight cultures 
were grown in the pretence of 0.1 *g/ml tetracycline, 
whereas 0.2 Mg/tnl were added to log cultures, All measure- 
menu were repeated at leasr twice. 

Media, enzymes and chemical*: Media and general 
phage techniques have been described (Miller 1972; Mai*- 
iates, Fbitsch and Sambrook 1982). Antibioses and 0- 
ninnQphcnyl-^-lvgRlactoside were obtained from Sigma, St 
Lows. Restriction endonucleases, £- cell t)NA polymerase I 
large fragment, T7 polymerase, calf interne alkaline phos- 
phatase and T4 WK ligase were purchased either from 
New England Blolabs (Schwalbach), Pharmacia (Freiburg), 
Boehnnger (Mannheim) or BRL (Dreicich). ATP, decocyri- 
bonudeoside triphosphates and cUdeoxyrtbonucIeoside tri- 
phwphate, were obciined from Boehringer (Mannheim). 
[^^dAT? (400 Ci/mmol) vm purchased rfrom Amersham 
^unsdwtefc). Oligonucleotides were synthesized using an 
Applied Biosystems automated DNA synthesizer model 
3BIA. 

Molecular techniques Mutagenesis of Tei repressor po- 
sitions 46 to 49 was accomplished by mutually primed syn- 
thesis pf degenerate oligonucleotide! as detailed by Hill 
(1989). The sequence of the oligonucleotide 5' 
?S£^5 CATGTAAAAAAT AACCGGGCCCTCCTCC^ 
ACGCGTCGAGC 3'. Bold letters (bases shown are wild 
type) indite that 6-7% each of the three non-wild-type 
bases were added at these positions during synthesis of the 
oligonucleotide. 

FUsmida: Plasmid pWH4 10 contains a fusion of the Ut 
regulatory region to the lac opeitm (UtA-laeZ fusion). It was 
V^nf* fcam P MC14 °3 (Casadabaj*. Chou and Cohen 
19 BO) and allows oJac E. coli strains to grow on lactose as 
che sole carbon source. Fksmid pWH414 differ* in two 
aspects from pWH410. First, it carries a utH-lad fusion 
(Figure 2). Second, it contains a one base pair framed)? ft 
rnutauon at the fusion of MA and UcZ. This renders Alac 
£. coli strains unable to grow on lactose. Nevertheless, 
phcnotypical detection of 0-g*fcctosidttsc activity with X-G»l 
is sull possible. 

J!l™ do ^! ianCe was ana ^d in strains containing 
pWHS55. This plasmid is a pBRS22 derivative j n which the 



Ficus* 1.— Rcguhogn *i & nc 
»pr«iion of ilit tranAptton 
*ncoc>d wrmcyclin^resismnoi de- 
terminant. Both genes UiA (encoding 
the mlsunce protein) and. utR (cn.„ 
coding "the Tct rcprca6r) am indi- 
«ted. Their divergent expression is 
Symbgfeed by wary lines carrcipand- 
in& 1* *he respective mRNAi. The 
Cen<r*| ut rcgubtory region constats 
of aevcral promoter! (not shown) and 
the iw> tat operators Oi and d rep- 
Jeienud by hatched boxes, Tetracy- 
cline Is Indicated by the small rectan- 
gle which binds to and induces Tet 
repressor. The Figure w v 
rrom Wis&mann and Hdujem (J 989), 

I 017 £*£ n was deletcd y^ ldln S PWH806 and the 
promoterlesi TniO tttR gene was mscnecT resulting in low 
level co^titutiv* expression (MuXCer-Hill, Coafo and Gn> 

PUtT I90o), 

* ^ m i d P w Hl4U was used for the cassette mutagenesis 
andas a derivative of P ACYC177 (CHANCand Cohek197s) 
u compatible to plasmlds derived from pBR522. It confers 
resistance to chloramphenicol and contains a constitutjvely 
eyressed ucR g cne . To allow cloning of short Qligonvcleo- 
Me casse^es between singular restriction sites, the sequence 
ot the MR gene was altered without changing the encoded 
protein sequence. pRT240 is similar to p\VHl41 l r except 
that it confers resistance to kanatnycin and contains a wild- 
HolS ^Hsf (BCR ™° r ' qL 1984; MnER ' W *ay and 
The pACYC177 derivatives pWHl200 and pWHJ20l 
^TOCHMILD ^ al 198S) ( pUCI9 (YANTsc/r-PtR r on , - 
Vtwra and Messing 1905), P W H 4S3 (Mctk. Wuav and 
tttLLEjr 1988) and P Mc5^8 (Stanssiw ti &L 2989) have 
been described. Plasmid pVVH10l2 (Si^mqrc et al 1990) 
with divergent tttK-gatKznd UtA-!<uZ transcriptional fusion^ 
was used for quantitative analyses of Tet repressor binding 
to M operator in vivo. B 

^ h ?r J ?. con ^ nirtip,ls knd CTOa »a PWH46S was dlrated 
T'th NM .and Smal yielding a 1950-bp fragment with the 
entire galK gene. In addition, this fragment contains a 
segment of ISO bp with translation*! stops in all three 
i^dingframesB' ofthegeneandaXt* terminator following 
Oie 3 -end of galK. After fillmg in the protruding ends the 
fragment w^ c ]aned into //incH linearised -MISmp9. A 
andidatc «ath toe dependent transcription of galK was 
named mWH22, a second lac operator with the proposed 
ideal binding sequence for Lac repressor (Sadler, S^Most 
and Betz 1983) was cloned 19 bp upstream of the start 

ur^*!? 1 "!* , ' mi0 the sln & ,e Nfld 3ite of m\ra22 yielding 
mwres. j n this ttjnstrucdon palindromic centers of the 
two hcoperators are separated by 2S3 bp. The galK con- 
etmct from mWH2$ was recombined into the lac sequences 

1934) Since this phage carries the cT"> allele from XpIarS, 
£. cal\ stnm ivsogemtcd with this phage were grown at 
u?moeratures below 33*. 

The construction of phage XtctSO has been described 
(Smith and Besctrawd I98fl). £1 coli NK5031(Xtet50) was 
created with mitomycin C and the resulting phage lysatt 
usedtolyiogenl2e£. cofiWH207. 

Selection of tenipenttue*sensitlvto Tex represser mu- 
tants; Mueagcniicxi pRT240 was transformed to £. cell 
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f 7 PWH4I0 and grown to saturation w 

2f«I lhC m K^toty region to the lac operon 

this step represents a selection again* binding of Tit n- 

oWH^ri'™^^^ to WH*W^WH») containing 
pWH4H. Tmnsfbrmaatt were^Wnto^uTiWn at 2»' 

m u dlU "? r t l? akcto4,! - Hcr <> «»• containing 
^nvativcs were isolated and mransformed to 

"on <it 42 m minimal medium using lactose. The pRTS40 
wuY!? 9 ^ plated, transformed to WH207 Hth 
£ IS ^l 0 ". * ,ucose rainl **I medium supple- 
mented vith ampiclllin, bnamyrin and X-GaL p] at « Sere 
mcubaced at 42- for 2 day* and blue colonies transferred 
to rmh plates cont»iniT^ the identical medium. la& phe- 
norypea were scored after incubadon at 2fi" for 2 day*. utR 
genes were recloned as Hindi fragments in pUC!9: From 
derivatives with lac promoter in Fusion*, EcoRl/Spkl DNA 
fragments containing utR were then inserted into the re- 
spective sites of pWHiSOO and pWHl 201. This yielded two 
sets of plasmids with pWHiioO derivatives directing a 
high, and pWHl20J derivatives directing a "low- 
constitutive expression of UtR in vivo (Berth and *t «/ 
1984). v 



RESULTS 

Selection of Tet repressor binding; to tet operator: 
The selection make* use of the to directed expression 
of divergently arranged lacZ and lad genes. Binding 
of repressor to the tet operators turns off transcription 
of both genes resulting in lacZT E* coli colonies. Ac the 
same time, the absence of Lac repressor allows cxpres- 
sion of a galaetokinase gene driven by the the lac 
regulatory region. This enables the E. coli strain to 
use galactose as the sole carbon source. In the absence 
of Tet repressor binding to ut operators. lacZ as well 
as lad arc expressed. Lac repressor binds to the lac 
operators and prevents transcription ofgatK. The cell 
cannot utilize galactose as die sole carbon source for 
growth and displays a latZ~ phenotype. 

The selection system consists of two plasmids and a 
* prophage and is depicted in Figure 2. pWH414 
makes use of the divergent m regulatory region in 
that both a tetRrlad transcriptional fusion as well as a 
tetA-hcZ fusion are present on the same plasnud. Tet 
repressor is supplied in trans by a second compatible 
plasmid ( P kT24Q). The third component of the sys- 
tem is the prophage XWH25 which provides a single 
copy tacPO^galK fusion. The host strain is £. coli 
WH207 and has a gal opexon with the galK2 mutation 
(see Materials and methods). 

A qualitative analysis of this system shows that all 
components behave as anticipated (see Table 1. lines 
} and In presence of Tet repressor, the strain 
n gaT and l**zr (line 2, galactose alone). In the 
absence of Tet repressor, the strain is gaf (fine 1, 
galactose ± tetracycline; line 2. galactose + tetracy- 
cline). In the absence of Tet repressor, lad repression 
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JNA U .ndidtcd by tM n lines, rele™ Kcn « as open bo^X 

g«ne in pRT240 ,nd.c*t« tho tnmscripi originating f£ m ^ kU 
1™°™' *±V™ ^ othrt " a ^ deHne. the fcan,mydn r^lst- 
ance gene. The arrow In pWH4U indicntes the bl* «nc Tec 
^«or » ,hown w , dlmer and Lac rcprcwor as a tetramcr. 

TABLE I 

T*t r»pi^or Lac r^>res*if dependent «p«Mion 
S^lttetokiziuc in E. ct>li WH207(XWH2£) 

Growth *nd phenm^ oriuain* 



pWH-114 

PWH414 + 
PWH414-2A 
pWrf4t4-2A + 



Colttfr CcUMtocc G«hctoic -f- 



+/-b 



I!IL j « j° n "Mnimnr plaua cwitaining ihe indicated carbon 
k ^ ™ cub * t ** * Tor 3 day, and scored Tor ^iany 

^ lfP n=1 f: r 1,1 Seated by (pRTS40) ( vh^rcai 
uvSJSS? B ,Bd, ™?l& (PWH120O). T&* final £»eentt5- 
mg to wbmhibitory amounts) for tetracycline. ^ 

can be parttally alleviated by addition of isopropyl 
thiogalactoside (IPTG) Ome I, galactose + IPTG); 
complete derepression is probably not achieved be- 
cause Lac repressor is present m such a high amount 
that u is never fully induced at the IPTG concentra- 
tion used (I <r 5 M). 

We have analysed the selection system with an 
operator constitutive mutation to demonstrate the 
necessity of functional ut operators for the observed 
regulation. For this purpose pWH414-2A was used 
instead of P WH414 which differs from the latter by a 
total of 4-bp exchanges in thc W operators. Meter, 
Wray and Hillxn (1988) have ahown that these 
mutations reduce binding of Tec repressor by about 
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three orders of magnitude. The phenotypes in the 
presence and absence of wild-type Tei repressor are 
" antic.pated (see Table 1, lines 8 and 4). Growth on 
glucose yields lecZ* phenotypes while growth on ea- 
lactose does not occur irrespectiv e of t he presence of 
Tet repressor. In ihepresence df galactose and IPTC 
this ittrain grows and is laeZ* (see above). 

For a quantitative determination of the selection 
efficiency, mixtures of strains were grown on selective 
plates. These contained cells with the component* 
shown in figure 2, and an excess of cells in which 
either the repressor encoding plasmid pRT240 was 
replaced by the vector without UtR, or the wild type 
operators (pWH414) were replaced by their consul 
t.ve mutants ( P WH4l4-2A). The results demonstrate 
that 30 cells with wild-type Tet repressor and ut 
operator an be efficiently selected on a single plate 
among 10* cells with either no Tet repressor or the 
let operator mutation. No white colonies indicating 
repression of lacZ by Tet repressor are selected as 
false positives from ] 0' cells. The appearance of a few 
blue colonies might be due to spontaneous mutations 
of the lad gene. It is the advantage of the divergent 
i« regulatory region that these candidates can be 
easily identified and discarded. 

TemperataTMensltive Tet repressor mutants: 
Temperature-sensitive Tet repressor mutations were 
selected by their ability to confer growth on lactose at 
42 and growth on galactose at 28" in appropriated 
SCrains («<? materials and methods), Seven par. 

b™ 16 " 10 " 5 indiv W*al preparations of 

PRT240 from the E. eoli mutator strain KD1067 
(Decnw and Cox 1974) were carried through. Five 
of these selections yielded colonies which were blue 
at 42" and white at 28 - with frequencies rangimr 
from 2 to 8596. The utR genes from one candidate of 
each of (he seven selections were sequenced. The 
obtained mutations are displayed in Figure 3. 

Temperature-sensitive Tet repressor mutants con. 
tamed either a glycine to glutamic add exchange at 
position 21 (OE2I) or an an isoleucine to asparagine 
exchange at position 193 (IN19S). The latter was 
independently selected four times. Another mutant 
(sec figure S) isolated by a different approach contains 
an alanine to aspartic acid exchange at position 89 
(AD89) and was included in the further hi vivo anal- 
yses. The two mutants without a temperature sensitive 
phenotype were identical and had a C-terminal dele- 
tion (AMI). The wild-type and mutant utR genes 
were rccloned resulting in two sets of plasmids direct. 
mg_ either "high" or "low" level expression of tHR. 

The mutants were assayed in viva for repression of 
a MA-JmZ fusion at 28', 37- and 42'. Furthermore, 
mduabilny by tetracycline and ttansdominance over 
wild type was tested. The results are presented in 
Table 2. Tet repressor mutants GE21, AD89 and 
INI 93 display a clear temperature dependency of lacZ 
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FIGURE J.— Sequences of muUnt Tei rr r..j ^ • »_ 

indS^TK* ^'J™ ^ b ^ ^ N-and C«*nr*Sen£ 
moieaiea. The «,lid poit.on define* iht potential a-helbwunw, 

and iMdm S6 » 47: b*cxsoN uid Besra«i«o was). The reefcn 

dueTe?,! l ^T , ° n ** « hatched (amino J* r2 

duei 64 to J 07: Store and Bkktsaxto Funllv a tJtJ*Z 

ftCSS: tr *r 

tod nwhtt 151 to 166; To***, Ernst tad Hnji* i gafij, ThB 
I the trt^?^ ° rMO,0 "? ne » ^B»ne at po. it | on 193 
A^t Tet reprotor pr ore (n wiih a total length of Ml mld^l 

repression, as evident from the ratios, whereas muont 
AI41 does not show repression in this sy«tem at all. 
At 28 and a 'high' level of UtR expression IN193 
snows almost wild-type activity an d Is clearly more 
active than AD89. On the contrary at a "low" level of 
utR gene expression 1N19S is not as effective as -wild 
type and is even less active than AD89. The repression 
efficienaes encoded by the -high" expression plasmids 
are 95- and 900-fold higher for AD89 and INI 93 
respectively, than the ones found in the -low" expresl 
sion plnsmid. AD89 is only partially mdueiblVby 
tetracycline, whereas the other mutants can be fullv 
mduced. GE2I and AD89 are transdomlnant. Y 
Combmatoiial mutag»nesi5 at the C terminus of 
Hie putative DNA recognition a-heli* of T«t repres- 
«»n Assuming that Tet repressor contains an «-helix- 
cum^-heux modf for operator recogniUon (PosrtE 
NGUYEN and BeBTRaNO 1984; PaBO and SAUER 19841 
IsACKfioN and Bertrand 1985), it is very likely that 
position 46 is pan of the n -hclix. whereas the second- 
ary structures of residues 47 to 49 remain unclear. 
Jo gam information about their possible partielparion 
in operator binding a combinatorial cassette muta K *n- 
«« (ReiDhaah-Oison and Saueu 1988) of T« re- 
pressor was performed (see Materials and methods) 
as shown in Figure 4. Mutant plasmids were tran*- 
tormed to £. co/i strains that either do or do not allow 
selection for tet operator binding of Tet repressor 
tttR genes of candidates from both procedures were 
Sequenced in the region of mutagenesis. Thirty-four 
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wit 

-cKpr«- 



* r * l «eydiilcifidun>on 



Tmrudomintnc* 



SB- 



37* 



42* 



High 

Hfch 

High 

H! ff h 

Low 

Low 

Low 

Low 



100,0 (±2.5) 100.0 (±3.4) 100.O<±$,9) 
0.0 (±0.0) 0.l(±o,iJ 0.2 (±0J) 
1M (±5.1) 86.2 (±8.2) 99,3 (±12) 
0.8 (±0.1) 43,2 (±3.1) 73.2 (±S.B) 
0.1 (±0.1) S0.3 (±].B) 89.2 (±1.3) 
»M(±1.9) J04.4 (46.fi) 100,0 (±0.4) 
29.7 (±0.0) 51.7 (±1.9) 6*8 (±61) 
M.0(±2,7) J00.1 Ctl.l) 100.8 (±3.9) 
76,3 (±0.7) 8a.S(±fi.9) 93.7 (±0/7) 
*M(±U) 9<U(±7.8) 97,4 (±4.1) 



lUUo 

37"/>a- 



A<^!itcio»icL«c detcrmirtfliionji vcre" performed 
rcprewon. h ™ indeed whether the - " 
rfotwb), ^CaJaciosidase va)u« obtained 



1 

4.7 
54.0 
20.3 
1.1 
1.7 
1.0 
1.2 
1.1 



100.0 (±fl.i) 100.0 (±4.5) 100,0 (±4.1) 
1-S(±0.2) 06.7 (±2 l8 ) * J 

48.6 (±S,g) 104.8 («4.7) 
4.6 (±1.3) 16.B (±J.i) 
1.3 (±0.8) 98.7 (±2.4) 



fella 

-w t T<* +wtTet* 



. 2.9 (±0.1) 
1-1(±0-D). Ufcto.D) 
94-3 (±1.6) 18.5 (±1.3) 
73.4 (±0.2) 5.7 (±0.3) 
9.4 (±0.*) 1.6 (±0.0) 
97.2 (±4.5) 2,5 (±0.2) 



1.0 
0.4 
8.4 
2.0 
0.6 
0.9 



.7" .4: i« rcpresjor wuhaut addition 

5 ^„™lTi Trnn * d 1 0lT »^"« wa» alio difwrmlncd wing 

1 Tet r^,,. TyfiCly, SS93 (MM) «„iu w^^^ «* *c *»ln lacki,* both^l^^' 



ft. S8 SS ?S? 3! SI ZS S?I Sit -J ?f . a ,« g; -jH^Vt. 

- ... ... «, t . M T „ „ ™ ;;; ~ - -« » 
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Flew* 4.*-Cassctte murngenwii of positions 46 to 49 of Tet rmr*«^ tu . . 



different mutants with either single or multiple «. 
changes at position! 46 to 49 were obtained and 
analysed in viva for repression of a tttA-lacZ fusion at 
Z8 and 37 • and for tetracycline induction. 

All mutant* isolated with selection for Tet repressor 
binding to M operator give rice to wild-type laeZ 
repression at 57». The only exception **, a triple 



mutant which showed a significant derepression of 
faeZ. At 28', which was the temperature used for 
mutant selection, this candidate also displayed wild- 
type activity. 

Single amino acid exchanges at position* 46 and 47 
had no detectable effect on repressor activity (data 
not shown). Three 0 f the fjv e mutam s « position 48 
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37* 



None 

Wild type 

KR« 

KQ4B 

KH48 

KM48 

KV«B 

RQ49 

RG49 

RP4& 

RW49 



100.0 (±6.6) 
1.7 (*0. J) 

ND 
5.0 (±0.4) 
l.B(±0.2) 
NO 

15.fi (±0.6) 
ND 
NO 
ND 
ND 



Induction 
~ -KT«r*cydinc ~ 



J 00.0 (±5.6) 

1.4 (±0.2) 
1.6 (AO, I) 

36.5 (±0.6) 
16 (±1.0) 
I J (±0.1) 

45.2 (±5.5) 

1.5 (±0.3) 
M (±0.2) 
I-S (±0.i) 
1.5 (±0.1) 



100,0 (±2.0) 
104.7 (±7.4) 
ND 

105.7 (±6.2) 

ND 
ND 

106.2 (±2,1) 

10.3 (±0.3) 
KD 

32.4 (±1.8) 
103.6 (±1,1) 



WH^7 ^ kml^ CT ^ Tll ?? tl0n, ^ Panned in & „« 
X,' " T pV8rh " P*^^H with rttf ird the «mouht of 0. 

rr 1 ITki ? 2 ^ sxperuncntil condom («p*3f&d fts "nono* m 

raT^JT' VCr ? carTiw, l °^ .train, grown * 
^8 unci 57 , w,th che re*pccuvc overnight culture gr d 5„ ^ ^ 

twTFT*™ Induction with teincydlne v*. "ko done «t 
37 (for dcmll* *CC M atuuals and metvods). 

showed a lower than wild-type repression activity 
(Table 3). The mutants at position 49 did not affect 
repression efficiencies but rwo candidates displayed 
only partial inducibility by , tetracycline. Multiple 
ammo acid exchanges at positions 46 to 49 influenced 
che repression activity only if position 46 was altered 
and che tetracycline inducibility only if position 49 
was altered (data not shown). 

DISCUSSION 

Selection of Tet repressor binding to let operator.' 

The selection described above is very efficient, b<s 
cause single cells with wild-rypc Tet repressor binding 
to wild-type i*t operator are found among a vast excess 
of up to 10 N cells with dther no or reduced binding 
of Tet repressor to Ut operator on one plate. The 
results with the 2A tet operator mutation show that 
Tet repressors must have an association constant of 
greater than 4 x 10« m"> to Ut operator in order to 
be selectable in this system. 

Temperatun-senntive Tex repressor mutant*: As 
depicted in Figure 3, GE21 is located in close prox- 
imity to the proposed a>heiix-turn- w -helix element. It 
is the weakest DNA binder and shows the strongest 
transdominant phenotype of all the mutants analysed 
in this study. This mutant has been isolated previously 
by Isacksok and Bertram (1085), but the author* 
did not describe the temperature dependent effect We 
have observed. We speculate that this mutation may 
interfere with the positioning or the DNa binding 
motif. 6 

ADfl9 is located in a region where noninducible 
mutants have been mapped previously {Smith and 



Bert^ 1988, ,„ agrccnicnt ^ th thcjc 
*ows only partial induction by tetracycline but also a 
^mdommant phenotype. Ai the same position 
r^LTf Bzktramd (1988) have also isolated a 

^ the DNA - and Inducer-binding 
£ 5* 1m !r d in tnmsmittini 

no function has been assigned so for. It gives rise to 
the *tronge« temp*rature*Iependent efFect observed 
in the course of this study. Tetracycline inducibility 
as fir tt detectable in our system is not affected and 
transdominance cannot be observed- When overnight 
cultures for 0-galactosidase determinations were 
grown at 28° and log cultures were incubated at 42° 
mutant INI 93 retains a much higher efficiency in ha 
repression than AD89 (see footnotes to Table 2). This 
phenoiype corresponds to the to ("tempcrature-cn. 
Mnve synthesis") mutants first described by Sadler 
and NcnacK (1965), where the oligomerized protein 
retains function upon shifting the culture to the non- 
pennj*nv e temperature. Assembly of new dimers is 
inhibited at the nonpenmsslve temperature due to 
either a defect in foIdingoF the monomer or inhibition 
ot dimer Formation (Goldeotkrg 1988). This might 
indicate that IN J 93 dimers already formed at 28 • are 
not inactivated upon raking the temperature to 42° 
InS C ? nC ™ r > r ' k *** bm show * n> for mutant 
ADtjg that shifting the temperature- to 42° clearly 
inactivates the protein (B, Staoe and W, Hillen, 
Unpublished results). Western blot analyses have 
shown identical levels of wild type and IN 193 when 
grown at 28 ' while at 37" no INI 93 protein is de- 
tectable (C. Berexs and W; Hiixe*. manuscript to be 
published). Taken together with the large increase in 
repression with concentration (see Table 2) this leads 
us to speculate that position 193 of Tet repressor 
might be involved in dimer formation. The C termini 
of 1 et repressor proteins from five resistance classes 
are rather homologous. They are preceded by a hy^ 
pcryariablc region (amino acid residues 151 to 166 of 
Tn/0 Tei repressor; see Figure 3 and Tqva*, Ernst 
and Hiujem 1988) which could indicate a possible C- 
terminxtl dmicrizadon domain of Tet repressor. 

Tet repressor mutants at positions 46 to 49: Sev- 
eral of the Tet repressor mutants at position 48 show 

i-ys either directly contacts DNA or that ft partici- 
pates in adjusting the structural conformation of the 
DNA recognition a-helix. Mutants at position 49 of 
Tet repressor show wild^type DNA binding, but in 
some inducibility with tetracycline is reduced. This 
phenotype can result from three effects; (i) reduced 
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pcaioons. The result if particularly surprising tine. 
prev,ousiy identified mutations In i^Z^L^ 
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Bacieriol. U 78 477-487. 

umLJ ■ dea, « gar*, cneodin* «« HfIte Jz2?: 

S*"** Prot " ML « ^E?5£ 



«nd 107 (Smith and BzrtkaND 1988), in a rerion 

«-he ix. The large number of mutant*, and a demon- 
stration that some show reduced binding of u££ 

binding «te for tetracycline. Thus, Arg<° of Tet re 

E7T be - ,OCated at the "DNA side" of £ 

skT^^"* -nducerbindingto the DNa binding 
site. However, superreprwion as a result of additional 

tHrrZ J e 5 Te P T ' SSS0 ^ nono P entor complex 
Hecht and Sauer 1985) « aUo possible. In concl" 

H% the cowbinacorud mutagenesis Sligge4t , that 
Ly, may be involved in operator binding a nd ArS» 

detected for Lys" and Asn 47 . 

in ™,l!Ir" k B - fop J*"* 1 . * "am. -nd hi, .dvic, 

,nmn „l ( ^ W ' KI,, ' ful 10 C - for h»lp with 
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Sim0 n Delagrave, Ellen R.Goldman and D*» C.Youvan' 

56-213. 77 Maauchutean Avenue, CambndBC. MA iuu 

Wc have developed a generally -fl^-*^ 
procedure to find tag*. I IJg*^ 
mutational steps from wild type, upum 
which are typically used to search for solu *J«J» "JJ 
combinatorial problems, have ^*2^Sg£ t h« 

steps normally performed dj a. fflguw > t . i recurSiV e 
in this new ^olecidnr geneto^n^c^ea 
ensemble mutagenesis (REM). ^^"T mutagenesis 
from previous iterations orcomblnatonal ^ e ^ ™ en ^ 
(CCMl to search sequence space more efficiently. We have 
£dUM to Sulleously mutate * amino acid rescues 
in mijui nmtPin As compared to conventional CCM, one 
Sii^^dl-S^ increase in the ^frequency 

magnitude is expected for the ™^8»^" .™ , iTexoected to 
of p<SiiiXutants as compared to random [NN(G,C)]„ 
5T53?"li*t harvesting H/protein engineering/random 
mutagenesis 

Introduction „ ■* 

even to predict function ^"SJJ^'JS^ of 
vendng the gt*s i» ^"jg^JSnK^iW * 

randoniized (mutates wimau hich mmt be 

evaluated to identify ^jJ^^Soi methods have 

where n is the number of ^ ™£J3^ rompl « libraries 

been -^H^.-J^wW * S333fflU» **** 

?I mU - m tt WsfS^oSt at. 199U Kang « d. , 
libraries (Smith, 1985. (Beaudry and Joyce. 

jgg Tn^eTWysS^heX £«ype, -J 
1992) are instances ui »j ^ selection and 

amplification of w ^" y Jr^ prW eins with 20 randomized 



different proem molecules is required. Obvuusly, ft. wg 
Senge our technical capabilities ^^J^^XS 
SesirabU avoid ^^^S^cXSuS 
proteins n a random library and supply ' "> n »n JL,^ 
of functional proteins, ^^^^^SSS^ 
to achieve a useful sampling of * < £^^ h l i nhancal 
ensemble mutagenesis (REM) U JT an 

ReTdhaar-OUon r««l. 1991) » •J^tSSSon 
genetical.y altered proteins 

1). Amino acids are retained m the l'D™7 " ™* f .j in0 
an altered protein fitting the selection criteria. Lists ot all amino 
"ids that are acceptable at ^^jSSSSfSE 
sets' of amino acids) are compiled. In the next ito 
combinatorial cassettes are SSS£JSS9^ 
mathematical functions that bias * e ." uc ^f^S Stated 
and Youvan. 1992b; Youvan etal., 1992) at cacn m 



Syntheilxe A»ndoro CCM CtuaMel 
AflMtft 
4 ftwatfcfl 
ExpreU CCM Uh™ry In dopm 

Select or Serwn Pwlth* MuttrtU 
T 

De»«rmlH«ONAS««|W»co 
Deduce Unique Polypeptide Sequeneei 

Ctlculate New Doping. Scheme 
Uilng SSO orPaAlooritnme 

Synthase New Ceirttm 



Next <t(nl/e» 



Fig. I. MM ^I^^^K^^ - 
trwe* (CCM). The ^^^^KtaSwe ibw picked 
screening a CCM library. Two or £■» ^ (rcnl experiment as 
and wquenced. (Positive ^.^S^^u ehamcSo of LHD 
binding slpJto- *" 2JJ!S!JS£S b determined by 
assembly.) Next, » |ta or unique ^Snuortauence* U deBned at the 
tmnslaiiag these DNA wquencea. A . jSTiKSdiieneo, only the tlm 

JSILl. jr«orc than .» P««f« ^^'afff. Po/^ch 

muiutcd pMlttan i» the P^VJ^ te delSriMby a mathematical 
compiled and the mwt W»prij» £** IT^'^ain of RBM 
(Unction such as 8™U> P™ 1 *^.^^ w ^nertteo eomblnatorlal 
pro »«J» by UHina test ^K^Vt^^W of^e propertie. of 

or lower •«^-J^J^ ( 3SSi artalng rromCCM 
REM. the complw'tyof «l» (Youvtn </ «i, 

should be ihown » bo m vaat eteees ot me «™ a 
1992). 

327 
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S.DclA£T*v« H at. 

position in the protein 10 encode these target sets of amino acids. 

f f Ala Ser and Thr occur at a given posit.on in 
££Z^^ ^ese amino acid, eortw jjt^ 

^attha^ 
M 'dope' that maximizes the probabHu.es of the amino a «ta 
„T*e target set. The next cassette is then dewed such tha du 
£rget - h ^oded by a simple mixture of nucleotides a tha 
codon (e.g. [(G.A.T)(C)(G t C)l|. In certain cases where jfam 
is a good match between selection criteria and structure uncart 
ta the genetic code (Sjostrom and Wold, 1985; Youvan 1991 
such as hydropathy and molar volume, computer simulations 
predict that multiple iterations of REM will yield thousands of 
times more mutants than conventional CCM (Arkin and Youvan. 
1992b; Youvan et ai, 1992). 

As a model system to experimentally verify the computer 
predicted amplification by REM, the light harvesting II (LHH) 
fl-subunit gene (Youvan and Ismail, 1985) of Bhodobacter 
capsutams was chosen. The LHH protein has two characteristic 
absorption bands in the near infrared (800 and 858 nm) that are 
red shifted relative to protein-free bacteriochlorophyll (Bchl) 
absorption at 770 nm. These prosthetic groups serve as colori- 
metric indicators of protein expression and subunit assembly. Six 
carboxy-terminal residues of the 0-subunit were initially mutated 
by construction of a combinatorial cassette containing the 
sequence [NN(G»Q]$, where 'N' designates an equiprobable 
mixture of all four nucleotides. This CCM library was conjugated 
into a strain of R.capsutatus (U7I) totally deficient in Bchl- 
binding proteins or any other compounds with significant absorp- 
tion in the near infrared (Youvan a at, 1985). This deletion 
background facilitates the use of digital imaging spectroscopy 
(DIS) (Arkin et ai. , 1990; Arkin and Youvan, 1993) to screen 
thousands of colonies directly on Petri dishes for LHH expression. 
We then sequenced five functional mutants and used this limited 
data to construct a new CCM library. The frequency of positives 
was increased 30-fold relative to the original library. 

Materials and methods 

Plasmids and strains 

Plasrnid pU4b is a shuttle vector used for cassette mutagenesis 
as well as expression of the mutant LHH genes (Goldman and 
Youvan, 1992). M13 was our vector for single^stranded 
sequencing and was propagated in Escherichia coli MV1190. 
Escherichia coli strain S17-1 was used for library construction 
and conjugation with Rxapsulatus U7I. For expression of the 
libraries „ R. capsulars U71, an LHH chromosomal deletion 
background (LHH and reaction center expression inactivated by 
a point mutation) was used. 
Materials and DNA manipulations 
DNA manipulations were essentially performed as described by 
Sambrook ct al. (1989). Restriction enzymes were obtained from 
New England Biolabs, T4 DNA ligase was from Bethesda 
Research Labs as was Taq polymerase. Sequencing was carried 
out using a Seouenase kit from United States Bibchemicals. 
Electioporarion was carried out in 0.2 cm cuvettes on 0.45 ml 
of competent cells using a Bio-Rad electroporator according to 
instructions provided. All oligonucleotides were synthesized on 
an Applied Biosy stems model 381 DNA synthesizer using 
comroercially available reagents, 
Library construction 

The unique Kpnl and Xhol sites of pU4b flank the region encoding 
the dimer Bchl binding site and the carboxy^rminua of the 0- 
subunit LHU gene. These restriction sites were engineered to 
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♦Pcsk shifted ui K45 nm. 

allow double-stranded eomhmaiorinl cassettes to lv suhelonetl 
in place of the wild type sequence. 

The sense strand of inc. 113-mers. which included the 
Kpnl-XhtA sites, as well us two PCR primers (20-ntcrs e;ich 
spanning a restriction site) were synthesized. "Hie doped .sequence 
within the cassette used in the zero iteration was (NN(O.C')U. 
The purified 113-mer was amplified hy PCK. A mpli I ilhI- clcnibic- 
stranded cassette wan then purified hy phenol extnietion and 
ethanol precipitation, Complete digestion of the cassette with Kpn\ 
and Xhol is carried out in n single incubation. The digested 
cassette is then purified by phenol and ether extractions and ultra- 
filtration in a Centricon 30 device (Amieon). 

Ligation is carried out for 24 h m WX* in 20 /rt with 
approximately O.i ^gof pU4b «imilarly digested with Kpttl and 
Xhol The resulting pU4b derivative (an aliquot of the ligation) 
aieduxcflydecu^oponitedintoS17-l Ktwli. AlkjutJtsofthe trans- 
formation are plated on LB-telracyclinc plates (after allowing I h 
for resistance expression) for complexity estimation and the 
remainder of the transformation is incubated overnight In 60 ml 
of LB-tetracycline. Plasmid pU4b derivatives wcro conjugated 
from E.coli S17-I donors into R.ccjpsultims strain U7I . The 
library is expressed by U7I transconjuganls .selected lor hy 
growth on RCV-tetracycline plates ut 32 P C. 
Dope optimization 

In computer stmulations t various functioas were used tot>ptimize 
the 'nucleotide mixtures 1 . In this work, only five functional 
mutant sequences were obtained in the /.ero iteration. Given this 
small number of sequences and in order to conserve diversity, 
we elected to use the group probability (P<*) function because 
it retains all amino acids in the target set. When presented with 
a target set at one position, the program •CyhurDnpe' (provided 
courtesy of KAIROS Inc., Cambridge, MA, USA) gtrcs through 
all integer nucleotide mixtures possible for a anion and evaluates 
for each mixture the value of P 0 : 



p g = np D [/] 



where P^i] is the frequency of occurrence of the Ah amino acid 
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position. The doped sequence within the cassette usw 

(c6)G]1(a!g,T)(C:T)G][(C,T.O)(C,T.O)C] 

Imaging spectroscopy 

Colonies were imaged a, spreads on ^l*^^^ 
rhe bacteria resuspwded after conjugation. The most recent 
Sfigurationof ^digital imaging ^f^f^S 
described (Arkir. and Youvan, in press). For the 
images, the Petri dishes were illuminated w l[h broad-band 
blue-green tight and an 830 nm long pass filter was placed m 
front of the CCD lens to obtain rad.ometncally calibrated 
monochrome images which were linearly mapped » P«idocotart 
after establishing the low and high gray scale values for both 
images. 

Results 

The experimental complexity (Lft. number of independently 
general clones) of the 'zero iteration' [NN(G,C)] 6 library was 
approximately 45 O00. The theoretical complexity of sue i a 
library at the nucleotide level is calculated as 32 (1.1 X 10) 
because there are 32 possible [NN(G,C)1 ccdoas; the 
experimental complexity is only a small fraction of this number, 
Preliminary screening used fluorescence, (Yang and Youvan, 
1988) which is indicative of Um assembly > to rapidly identity 
mutants expressing LHH. Mutants are then more closely 
evaluated by ground state absorption measurements using DIS. 
We observed a low frequency of highly fluorescent colomesm 
the zero iteration of REM (ca. one positive mutant in 10 000 
colonies screened). Relative to wild type absorption, DIS showed 
a decrease in the optical density at 800 and 858 nm for these 
few positives. , 

Because of their rarity, only five positives were obtained from 
the zero iteration of REM. Four of these five mutants fit the 
selection criterion of displaying significant absorbance at 858 nm 
and another, REM0.10, had an interesting phenotype. The five 
positives were repurified and sequenced (Table I). The 
composition of a first iteration cassette was calculated by the 
computer program 'CyberDope'. which generates DNA dopes 
that maximize the overall probability of the target set. To add 
diversity to the target set, the wild type sequence was also 
included. Therefore, while not taking frequency of occurrence 
into account because of the small sample size, for the first doped 
position the target set is F, S, A, L. The output of CyberDope 
at the micleotide level gave the codon [(G/I)(C,T)(C,G)], which 
encodes amino acids A, S, V (0.25 probability of occurrence 
for each) and F, L (0.12 probability of occurrence for each)* 
Valine is unavoidably encoded by this dope because of the 
structure of the genetic code. 

Figure 2 demonstrates the amplification properties of the REM 
methodology as assayed by digital imaging spectroscopy using 
both fluorescence emission and ground state absorption imagery. 
The first iteration of REM yields a 30-fold increase in the 
frequency of enhanced fluorescence mutants (Figure 2 A and B). 
As compared to zero iteration REM data, DIS analysis of the 
first ueration library shows both an increase in the percentage 
of positive mutants (i.e. throughput) and an increase in protein 
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levels as 
(Figure 2 
Twelve 
"REM 
sequences 
REM1.9) 
blue-shifted 
found in all 
of 0-subur|hs, 
iteration 
in the first 
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I mutunts nave an mva *.»»"■ K 

; 2 9 sequences in the known phylngeuy < /.uhe r 
lis This phenotype was Una observed in ihe /cm 
litfe muJni REMIMO). hut now Imds itselt ainplilia 
Son. Note also thai RHM». m contains the miiik* 
hotif inversion. 



Discussior 

To show that computer 

prediction if an increased throughput ol pos.t.vo, mt -HH auu 
was iteratWely mutagen!/*] at Us six earhoxy terminal residue* 
From the ;Jero iteration (CCM) ibiui. taiyel ^ «■ ™" * * 
were defmid. A computer encoded algorithm general a do,^l 
oligonucleotide which best represented the * ™^ 

mutagentotd position. Expression of tins new 1 h.ar> «he 1 rs 
iteration of REM) revealed a substantial ampliation m il* 
throughpu 'of pseudo wild type mutant*, i'roni the /.em itcrainm 
library wh^re roughly 10 (XX) colonies werv seined to identity 
one positive, we can now conveniently identity a new positive 
by screen ng only about 300 colonies. This corresponds io a 
30-fold irirease in overall throughput, suggesting that nwwimu_ 
18 sites oflsimilar stringency would yield a 30 or 2/ (MXMolu 
increase in throughput over random mutagenesis usutj; 

(NN(G,q| ia . , _ 

The altered proteins obtained by combinatorial mutagenesis 
are not necessarily trivial variations of the wild tyi* sequence, 
An invers on of a completely conserved motif was observed m 
some mutants. Therefore, the sequence duta indicate ihul KP.M 
does not recapitulate the known phylogcny. Mechanistically, the 
simulttnecius (experimertial) raiukimizution of six sites in a prUem 
may havelno analogy in nature. 

In this \iork, experimental evidence is given that RHM allows 
an efficient search of sequence space by producing mutant 
libraries \rtth increased frequencies of selected 'positives'. Hue 
to the high stringency of the region chosen for mutagenesis, only 
a small sequence database was available for the construction of 
the first iteration dope. In systems where large complexities can 
be achieved easily (e.g, phage display libraries), more sites etui 
be mutated at once and more positives isolated, giving a more 
complex sequence database. As a consequence, other dope 
optimising equations (Youvan et al> , 1992) could he used which 
would bejibetter suited to yield large increases in throughput. 
Alternati>fely, different short stretches or amino acids could he 
ran&nnized and the zero iteration date from these libraries p<xi!ecl 
to produdfi a first iteration dope matagenizing many more sites 
than ordinarily possible with CCM. 

It is pnportant to make the connection between our 
algorithimcally-based doping schemes and protein engineering 
projects jtoiere CCM Is currently being used. RKM decreases 
the fractiein of null mutants in the populution, therefore more sites 
can be siihuhaneously mutaied. Model experimenus on LHU can 
be used tb optimize REM methodology, including the nucleotide 
doping equations. While DIS is limited to screening about Ur 
coloniesJjphage display libraries (Smith, 1985; Hoogenboom 
et at. t l»l; Kang et oJ. t 1991) can be used to select mutants 
from Ubijaries with complexities exceeding I0 U . Based on our 
prelmunary experiments, we expect greater phenolypie divcrsiiy 
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can be isolated, which * ^K^^&ce addUiiwal diversity 
inanwhodylibraiiasn^airwuyF enhun(X j by (he use ot "ur 

pnlSe. .^ence space 

in a mathematically rigorous fashion. 
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A method of combinatorial cassette mutagenesis was 
designed to readily determine the informational content 
of individual residues in protein sequences. The technique 
consists of simultaneously randomizing two or three 
positions by oligonucleotide cassette mutagenesis, select- 
ing for functional protein, and then sequencing to deter- 
mine the spectrum of allowable substitutions at each 
position. Repeated application of this method to the 
dimer interface of the DNA-binding domain of X repres- 
sor reveals that the number and type of substitutions 
allowed at each position are extremely variable* At some 
positions only one or two residues are functionally accept- 
able; at other positions a wide range of residues and 
residue types are tolerated. The number of substitutions 
allowed at each position roughly correlates with the 
solvent accessibility of the wild-type side chain. 



IT HAS BEEN MORE THAN 20 YEARS SINCE AnFINSEN AND HIS 
collogues showed that the sequence of a protein contains all of 
the information necessary to specify the three-dimensional 
structure (i). However, the general problem of predicting prorcin 
structure from sequence remains unsolved Part of the difficulty may 
stem from the complexity of protein structures. Although some 200 
protein structures are known, no rules have emerged that allow 
structure ro be related to sequence in any simple fashion (2). The 
problem is further complicated by the nonuniformity of the struo 
rural information encoded in protein sequences. Some residue 
positions are important, and changes at these positions can tip the 
balance between folding and unfolding {3-7), Other residues are 
relatively unimportant in a structural sense and a wide range of 
substitutions or modifications can be tolerated at these positions {3, 

If only a fraction of the residues in a protein sequence contnbutc 
significantly to the stability of the folded structure, then it becomes 
important to be able to identify these residues. We now describe the 
results of genetic studies that allow the importance of individual 
residues in protein sequences to be rapidly determined. Specifically, 
we determine the spectrum of functionally acceptable substitutions 
at residue positions near the dimer interface of the NH2<crminal 
domain of phage lambda (X) repressor (10). The NH2- terminal 
domain binds to operator DNA as a dimer, with dimerization 
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mediated by hydrophobic packing of a helix 5 of one monomer 
against a helU 5* of the other monomer {11) (Pig. 1, A and B). 
Without helix 5 there are no contacts between the subunits (Fig. 
1C), By applying combinatorial cassette mutagenesis ro the helix 5 
region, we find that the number and spectrum of allowable substitu- 
tions within helix 5 arc extremely variable from residue ro residue. 
In most cases, this variability can be rationalized in terms of rhc 
fractional solvent accessibility of the wild- type side chain. 

General strategy. For our studies, wc used a plasmid-bocnc gene 
that encodes a functional, operator-binding fragment (residues 1- 
102) of \ repressor (12). The binding of the 1-102 fragment to 
operator DNA depends on dimcrizarion which, in turn, depends on 
the heKx 5-heIix 5' packing interactions {11 % 13). Thus, if a 1-102 
protein retains normal operator- binding properties, we can infer 
that it is able to dimerize normally. 

Mutagenesis of the helix 5 region was performed by a combina- 
torial cassette procedure. One example of this method, in which 
codons 85 and 88 arc mutagenized, is illustrated in Fig. 2. On the 
top strand, the mutagenized codons are synthesized with equal 
mixtures of all four bases in the first two codon positions and an 
equal mixture of G and C in the third position. The resulting 
population of base combinations will include codons for each of the 
20 naturally occurring amino acids at each of the mutagenized 
residue positions. On the bottom strand, inosine is inserted at each 
randomized position because it is able to pair with each of the four 
conventional bases (M). The two strands arc then annealed and the 
mutagenic cassette is ligatcd into a purified plasmid backbone. 

To identify plasmids encoding functional protein, we selected 
rransformants for plasmid-cncodcd resistance to ampicilltn and for 
resistance to killing by el" derivatives of phage X. The latter selection 
requires that the cell express 1-102 protein that is active in operator 
binding (J5). For each mutagenesis experiment, many independent 
transforrnants were chosen, single-stranded plasmid DNA was 
purified, and the relevant region of the 1-102 gene was sequenced. 
The resulting set of sequences provides a list of functionally 
acceptable helix 5 residues. 

Substitutions in the helix 5 region. In separate experiments with 
different mutagenic cassettes, the codons for helix 5 residues 85 and 
88; 86 and 89; 90 and 91; 84, 87, and 88; and 84, 87, and 91 were 
mutagenized, and genes encoding active 1-102 proteins were 
selected. In some cases, the survival frequency was low. For example, 
only 17 of 60,000 transfbrmants passed the selection after random- 
ization of codons S4, 87, and 88. In this case; each active candidate 
was sequenced. By contrast, 1,200 of 50,000 transforrnants passed 
the selection in the mutagenesis of positions 86 and 89 {IS). In mis 
case, we picked 50 candidates for sequence analysis. Overall, 150 
active genes were sequenced (Table 1). In addition, wc sequenced 
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1 ^^cnonal selection. These serve as controls for the efficiency 

r oT mutagenesis and also provide examples of helix 5 mutations that 
result; in inactive 1-102 proteins (Table 1). 

Many of the active sequences contain at least two residue changes 
compared to wild type. In principle, some of these changes could be 
compensatory; for example, residue Xjrughrbe functionally allowed^ 
at position 85 only in combination with residue Z at position 88. 
This cannot be generally true, however, because most residue 
changes at one position were recovered in combination with several 
different changes at the other position or positions. It is therefore 
likely that most substitutions that are runcrionally acceptable in 
multiply mutant backgrounds would also be allowed as single 
substitutions. In Fig. 3, we show the spectrum of fiuictionally 
acceptable substitutions at residue positions 84 to 91. 

From the list of allowed substitutions, several conclusions may be 

Table 1. Sequences for the helix 5 region of active and inactive mutants 
obtained by combinatorial cassette mutagenesis. Active mutants arc resistant 
to phage XKH54; these are grouped by cattettc.with the wiid-typc sequence 
at the top of each group and randomized positions in boldface, Asterisks 
indicate sequences of mutants obuined in the absence of a functional 
selection. The activity of these mutants was subsequently determined by a 
screen. Numbers next to sequences indicate the number of times particular 
mutant sequences were obtained. Numbers at the tops of the columns 
indicate amino acid positions, The one-letter abbreviations for the amino 
acids arc; A, Ala; C, Cys; D, Asp; E, Glu; F, Phc; G, Gty; H, His; I, He; K, 
Lys; L, Leu; M, Met; N, Asn; P, Pro; Q, Gin; R, Arg; S, Scr; T, Thr; V, 
Val; W, Trp; and Y, Tyr. 
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draWft coocctning the structural requirements at various positions in 
helix 5. We now consider these residue positions in order of 
decreasing "informational content," where this term is roughly 
defined as a value that decreases as the number of allowed substitu- 
tions increases. Thus, the informational content of a residue position 
is highest if only the wild-type amino acid is allowed and is lowest if 
_cach of the 2Q naturally occurring arnino.acids is allowed. 

Positions 84 and 87 in particular stand out as having a high 
informational content. He appears to be the only acceptable residue 
at position 84. Both Met and Leu are residues of similar size and 
hydrophobiciry, and are the only two residues that appear to be 
functional at position 87. The side chains of lie 84 and Met 57 form a 
major part of the helix-helix packing interaction at the dimcr 
interface, where lie* 4 of one subunit packs against Met 87 ' of the 
other subunit, and vice versa (Fig. 4). This clusrer of four residues 
also contacts the globular portions of the domain. Solvent accessibil- 
ity calculations by the method of Lee and Richards (77) show that 
the lie 84 and Met* 7 side chains arc almost completely buried (92 to 
98 percent solvent inaccessible) in the structure of the dimer. We 
assume dm replacement of He** 4 or Met 27 with smaller side chains 
would diminish dimerization because hydrophobic and van der 
Waais interactions would be lost. In fact, mutant repressors contain- 
ing Scr* 4 or Thr 57 arc defective in dimerization (13 t iff). Replacing 
He* 4 or Met 87 with larger residues would also be expected to be 
detrimental because substantial structural rearrangements would be 
required to accommodate larger side chains. 

Seven residues (Leu, lie, Val, Thr, Cys, Scr, and AJa) arc 
functionally acceptable at position 91. Aromatic residues* charged 
residues, and strongly hydrophilic residues are not found. The wild- 
type Val side chain is partially buried in the dimer structure, with the 
Cy2 methyl group packing against the C81 methyl group of the 
11c 84 " side chain. Although 'some of the acceptable substitutions such 
as He and Thr could make equivalent packing contacts, others such 
as Ala and Scr could nor. 

Nine residues (Trp, His, Met, Gin, Leu, Val, Scr, Gly, and Ala) 
are acceptable at position 90. There is a surprisingly large range in 
both the acceptable size and hydrophilicity of these side chains. This 
is especially true as the C(J methyl group of the wild-type AJa is 
almost completely buried in the structure of the dimer and, at first 
glance, it would appear that larger side chains could not be 
accommodated. However, the inaccessibility of the C0 methyl 
group of Ala 90 is largely caused by the Lys 67 ' side chain, which packs 
against it. By rotating the Lys 67 ' side chain away, we were able to 
introduce a Trp 90 side chain by model-building without stcric 
clashes. Rotarion of the Lys 67 ' side chain away from Ala 90 should' 
not be energetically costly and, in foct, is observed in crystals of die 
NHi-tcrminal domain bound to operator DNA (19). 

Nine different residues (Trp, Tyr, Phc, Met, lie, Val, Cys, Scr, and 
Ala) arc functionally acceptable at position 88. There are large 
variations in the sh&es and volumes of the acceptable side chains, 
although most are relatively hydrophobic. Charged residues and 
other srrongly hydrophilic residues arc not observed. In the wild- 
type dimer (11), the aromatic ring of Tyr 88 stacks against the ring of 
Tyr**\ The side chains of Trp, Phc, Met, lie, and Val could probably 
form some type of packing interaction at this position, although 
those of Ala and Ser could not. It is known that the presence of Cys 
at position 88 allows a stable Cys^-Cys 88 ' disulfide bond, which 
links the monomers in a conformation that is active in operator 
binding (20). 

Positions 85, 86, and 89 show considerable variability. At each of 
these positions, 13 different amino acids were found to function. At 
positions 85 and 86, aromatic, hydrophobic, polar, and charged 
residues arc all acceptable. At position 89, aromatic residues were 
nor represented, but each of the remaining classes was observed. In 
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R fl 1 Three view* Of the DN A-hindins domain of X repressor, showing the rotated 90" from the view in (A), 10 show the "back side'; of the molecviie 

2 ofTc Z Fin ££2S£ (A) Proposed complex of repressor cLer (C) 1W with Mb 5 of each monomer removed. Thts v,cw illustrate* the 

5* opcraroi ■ DNM^H^ 5 of each monomer is colored more Ughtry rote helix 5 plays in mediating d.merizatiOrt {26). 
han rhe irlobular portion of that monomer. (B) Free repressor dimcr, 



ArQ G'u it" 
T CCaA GAA ATC 
CTT TAG 



Glu Ala Vai Scr Mul 
r$AA GCG GTT AGC ATC 
CTT CGC CAA TC 



than rhe globular pom on 

Fig. 2. Schematic diagram showing the combinatorial cassette rnutagencsb 
procedure. At positions indicated as N, an equal mixture of A, 0. C, and T 
was used, during oligonucleotide synthesis. Al positions indicated as j, 
inosinc was used. After synthesis, the oligonucleotides were phcsphorylated, 
aiinttlcd, and ligatcd into the Xho I-Sph I backbone of plasmid pJO103. 
PUvmid pJOl03 is an M13 origin pilasmid with die 1-102 gene under 
control of a Mf promoter; the region of the 1-102 gene encoding residues 
82-93 (the 'small Xho I-Sph I fragment) is replaced by an unrelated 1.9-kb 
Xho I-Sph I "sniffer* fragment. Ligatcd DNA was transformed into 
Escfcrhbid r*U strain X90 F7ad° cells (21), and ait\picil]m- resistant colonics 

scribed in (21)1. Single-stranded plasmid DNA was purified from an MURVl transducing lysate as described and DNA sequences were acterrmnea oy 
the dideoxy method {29). 
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the wiJd.rypc dimcr, the side chains of Tyr w , Glu* 6 , and Glu* arc 
relatively solvent accessible. 

Several amino acids are significantly underrcpresented among the 
active sequences- For example, Pro is never found. This cannot be an 
artifact of our mutagenesis procedure because Pro is frequently 
observed among the unsejected mutant sequences (Tabic 1). We 
conclude that Pro is not found among the functional sequences 
because it is selected against; its presence would presumably disrupt 
the ct-hclical structure and thereby the helix-helix packing at the 
dimcr interface. 

His, Asn, and Lys arc also underrepresentcd among the functional 
helix 5 sequences. These residues are presumably not acceptable at 
positions 84 and 87, where the informational conrent is extremely 
high, and may not be acceptable at positions 88 and 91, where the 
functional substitutions are generally hydrophobic in character, The 
acceptability of these residues at positions such as 85 and 86 is 
difficult to assess from our experiments because the codons for these 
residues are present at reasonably low frequencies even among the 
uosclected sequences. In these cases, we probably have not se- 
quenced a large enough number of candidates to be confident that 
all acceptable substitutions have been identified. In fact, data from 
reversion studies (21) and suppressed amber studies (22) show that 
His 85 and Lys 86 arc acceptable substitutions in the context of the 
intact X repressor molecule. 

Informational content and protein strocturei We have com- 

.•IJU1Y ip«8 



Arg 
Gin 
Glu 
Ser 
Thr 
Tyr 
Cys 
Gly 
Ala 
Trp 
Leu 
' Val 
lie He 

I I 

— Me — Tyr- 

84 85 



ASp 

Gin 
Glu 
Ser 
Thr 
Tyr 
Gly 
Ala 
Met 
Trp 
Leu 
Phe 
lie 

I 

-Glu- 

86 









Tyr 




CyS 




Ala 




Met 




Trp 




Vat 


Met 


Phe 


Leu 


lie 


I 

Met— 


I 

■Tyr 


87 


63 



Arg 
Lys 
Asp 
Gin 
Glu 
Ser 
Thr 
Cys 
Gly 
Ala 
Met 

Leu 
He 



Gin 
HiS 
Ser 
Gy 
Ala 
Met 
Trp 
Leu 
Val 



-Glu — Ala- 

69 90 



Ser 
Thr 
Cys 

Ala 
Leu 
Val 

lie 

I 

-Val — 

91 



Fig. 3. Functionally acceptable residues in the helix 5 region. The amino 
acids arc listed from top to bottom in order of increasing hydrophobicity 
according to the scale of Eiscnberg ct al. (30). 

bined an efficient combinatorial mutagenesis procedure and a 
functional selection to probe the informarionaJ content of the eight 
residues that form the major part of the cumcrization interface of the 
NH 2 *terminal 1 operator-binding domain of X repressor. At two of 
these eight residue positions, the functionally acceptable choices arc 
highly restricted. For example, wc analyzed 17 functional genes in 
which codon 84 had been nndomized and recovered the wild-type 
residue, Ue, in every case. This is clearly a position of high? ; 
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Fig. 5. Correlation between the solvent accessibility and the number of 
functionally acceptable substitutions. Hatched bars indicate the percentage 
of the 20 naturally occurring amino acids that arc functionally acceptable at a 
residue position. Black bars indicate the fractional solvent accessibility of rhe 
wild-type side chain in the dimcr. Solvent accessibilities for the NH ; - 
terminal domain dimcr (11) were computed using a 1.4 A probe by the 
method of Let and Richards (17). Fractional accessibilities were obtained by 
dividing by the appropriate side chain accessibilities calculated for The 
monomer. The fractional accessibilities change only slighdy if the side chain 
accessibilities in the reference tripeptidc Ala-X-AJa (/7) arc used instead as 
the reference state. 

informational content. The informational content is also high at 
position 87, where Met and Leu arc the only acceptable residues. By 
contrast, the remaining positions have moderate to low informa- 
tional contents, for example, among 38 functional genes in which 
codon 85 had been randomized, the wild-type residue was recovered 
only once, and 12 other residues, differing in size and chemical 
properties, were recovered in the remaining cases. This is clearly a 
position of low informational content. It is striking that most of the 
structural determinants of dimcrization in this eight-residue seg- 
ment reside in two residues only. The remaining positions are 
surprisingly tolerant of a wide range of substitutions. If this high 
level of tolerance is generally true of protein sequences, then the 
problem of understanding and predicting structure may rcsr largely 
on the abilicy ro identify those few residues that are crucial. 

The positional variability of the informational content in helix 5 
ean, in general, be rationlized in terms of d\e solvent accessibility of 
the wild-type residues in the crystal structure {11). There is a rough 
correlation between the number of acceptable substitutions and the 
fractional extent to which the wild-type side chain is solvent 
accessible (Fig. 5). At exposed surface positions such as 85* 86, and 
89, wc find that many different residues and residue types can be 
functionally accommodated. By contrast, at positions such as 84 and 



87, where' the wiid-typi side chain is almost completely buried, we 
find that the functionally acceptable residue choices are extremely 
restricted There is one apparent exception to the simple rule that 
buried residues art high in informational content. Ala^is inaccessi- 
ble to solvent in the crystal structure, and yet wc find that many 
substitutions arc allowed at this position. However, the inacessibi- 
lity of the AJa 90 sid e chain to solvent is not due to dose packing at 
the dimer interface, but rather to an interaction with a nearby 
surface side chain. This side chain can presumably move to allow 
larger side chains to be accommodated at position 90. Examples of 
this type demonstrate the need to distinguish between two types of 
buried side chains: those that can become exposed by relatively 
minor rearrangement of other side chains, and those that are tightly 
packed in the hydrophobic core. 

There is no reason io assume that there should always be a strict 
correlation between the solvent accessibility of a residue and the 
structural informational contenr of that position. For one thing, the 
chemical properties of the 20 amino acids are not related in any 
simple linear fashion. Moreover, die structural importance of some 
residues in proteins almost certainly stems from interactions other 
than simple hydrophobic packing. Nevertheless, the closely packed 
nature of protein interiors (23) provides a simple molecular cxplana* 
tion for the structural importance of buried residues, and destabiliz- 
ing mutations are commonly found to affect hydrophobic core 
residues (3-7). By contrast, missense mutations or chemical modifi- 
cations that affect surface residues are often found to have iitde or no 
influence on protein stability (3, 7, 8). Thus, it is reasonable that 
solvent accessibility should be an extremely important determinant 
of the informational content of a residue position. 

Our overall strarcgy for rapidly probing informational content 
should be broadly applicable to a wide range of protein strucrure- 
funcrion problems in systems where genetic selections or screens can 
be devised. The method consists of three basic elements; (i) the use 
of cassette mutagenesis to introduce extremely high levels, of target- 
ed random mutagenesis; (ii) the use of a functional selection to 
identify genes encoding active proteins; and (iii) the use of rapid 
DN A sequencing methods to determine the spectrum of functional- 
ly acceptable residues in a relatively large number of candidates. Our 
method of combinatorial cassette mutagenesis (Fig. 2) allows several 
residue positions to be mutagenized at the same time and, in 
principle, generates a mutant population in which each of the 20 
amino acids is represented at each mutagenized position (24). When 
iwo or three codons arc mutagenized at the same time, the entire 
analysis is able to proceed more rapidly. Moreover, at this level of 
mutagenesis most two-residue and three-residue combinations 
should be present in the mutagenized population and should be 
recovered if the)' rcsulr in a functional protein. In our study of the 
packing of the 84 and 87 side chains, we recovered only two (lie 84 
with Met 57 and IIc M with Leu 87 ) of the 400 possible residue 
combinations. Thus, because both positions were mutagenized in 
the same experiment, wc arc able to conclude that there arc nor 
significantly different ways of packing the dimcr interface. 

In principle, data like that shown in Fig. 3 could be generated for 
an entire protein sequence, and additional experiments could be 
devised to determine whether the positions of high informational 
conrent were important for structure or function. For proteins of 
unknown structure, such data might be quite useful for structural 
predictions. First, current predictive algorithms could be applied to 
the family of related sequences generated by our method, as each of 
these sequences is able to form the same basic structure. Second, 
because of their fundamental repeats, ct-hclical and (5-strand regions 
might be recognized by characteristic patterns of high and low 
informational content Third, the positions of highest structural 
informational content should include the residues involved in 
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formation of the hydrophobic core of the protein. This information 
might prove useful in combination with the tertiary template ideas 
recently proposed (2J). 
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